WO2005001764A1

WO2005001764A1 - Image input device, robot, and program

Info

Publication number: WO2005001764A1
Application number: PCT/JP2004/009193
Authority: WO
Inventors: Kyoji Hirata
Original assignee: Nec Corporation
Priority date: 2003-06-30
Filing date: 2004-06-30
Publication date: 2005-01-06

Abstract

An image of an object is input by image input means (1). Mark image generation means (2) outputs a mark image indicating the position where the object eye position is to be arranged on the display image. The mark image and the input image are superimposed by image superimposition means (3) and the superimposed image is displayed on display means (4). A user corrects the zoom in/zoom out function of the image input means (1) and the relative position between the image input means (1) and the object, thereby adjusting the superimposed video in such a manner that the eye pupil position is superimposed on the mark image position. After the adjustment, instruction means (5) instructs image input acquisition and the input image at the moment is recorded in image storage means (6).

Description

Specification

Image input device, robot and program

Technical field

[0001] The present invention relates to an image input technique, and in particular, in order to appropriately input and record an object or a specific portion of the object in order to recognize a certain object or a specific portion of the object by image analysis. The present invention relates to an image input device, a robot using the image input device, and a program.

^ Scenic technology

[0002] In recent years, many techniques for recognizing a subject (object) of an image captured using a camera or a video have been proposed.

[0003] A technique for recognizing an object in an image is to store in advance a feature amount of the object to be recognized in a database or the like, and to combine this information with the extracted feature amount of the input image power. This is done by comparison (for example, see Patent Document 1).

There is also a method in which information of a template representing a subject is stored in advance, and the information is compared with a feature amount extracted from an input image (for example, see Patent Document 2).

[0004] In such recognition technology, it is necessary to extract a subject or a feature amount of the subject from an input image in a form that can be compared with a template or a feature amount in a database. Correct detection and segmentation of the subject was a major issue.

In order to solve this problem, several techniques have been proposed for detection of a subject (object) or a specific portion of the subject (object).

[0005] For example, when the subject (object) is a human face, the range in which the eyes are present is estimated from the binary image, the eye candidates are specified in the range, and the eyes are determined from the ratio of the thin line portions in the multi-valued image. There has been proposed a technique for performing automatic discrimination (see, for example, Patent Document 3).

In addition, a technique has been proposed in which the direction of a face is automatically extracted and recognition is performed according to the direction of the face (for example, see Patent Document 4). [0006] Patent Document 1: Japanese Patent Application Laid-Open No. 2001-16579

Patent Document 2: JP-A-11-144057

Patent Document 3: JP 2002-331172 A

Patent Document 4: JP-A-2002-288670

Disclosure of the invention

Problems to be solved by the invention

However, while the technology for automatically detecting a subject as described above has many unfinished parts, there are many restrictions, and in order to automatically detect a subject under such restrictions, a target The target object must be photographed in a state where the position, posture condition, size, lighting conditions, etc. of the object are appropriate. For example, when the object to be recognized is small, in the corner of the image, or when a specific part is photographed in an inappropriate state for recognition (for example, when photographed with sunglasses or the like). It cannot be detected properly.

[0008] In addition, the photographer of the object to be recognized does not know in what form the object is to be photographed, for example, at an angle or size. In many cases, an image was shot such that a specific portion could not be detected.

[0009] Furthermore, the current process of performing automatic detection has a large burden on small mobile information processing terminals such as mobile phones and PDAs, and the like. Extraction is not realistic because of the heavy processing load.

Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to arrange an object or a specific portion of the object at an appropriate position, direction, or orientation in an entire image. It is another object of the present invention to provide an image input technology capable of photographing and recording.

[0011] Further, an object of the present invention is to photograph and record a target object or a specific portion of the target object arranged at an appropriate position, direction, or orientation in the entire image, and to specify the target object or the target object. It is an object of the present invention to provide an image input technique capable of cutting out and extracting a target object or a specific part of the target object and improving the accuracy of recognition processing by recording information such as part identification or position information.

Means for solving the problem

[0012] The image input device of the present invention is a display means for displaying an image of an object to be photographed. And a mark superimposing display means for displaying on the display means a mark image indicating a position at which the target object or a specific portion of the target object is to be arranged so as to be superimposed on the image of the target object.

In one configuration example of the image input device of the present invention, the image of the target is an image input for recognizing the target by image analysis.

Further, in one configuration example of the image input device of the present invention, the mark superimposing display means includes: storage means for storing a plurality of mark images corresponding to an object to be recognized or a specific portion of the object; Selecting means for selecting a mark image suitable for an object to be recognized or a specific part of the object from a plurality of mark images stored in the means; and selecting the mark image selected by the selecting means. And superimposing display means for superimposing and displaying the image on the image of the object.

In one configuration example of the image input device of the present invention, the mark image is an image that specifies the location of a target or a specific portion of the target with one index image. In one configuration example of the image input device of the present invention, the mark image is an image that specifies the location of a target object or a specific portion of the target object using a plurality of index images. In one configuration example of the image input device of the present invention, the mark image is an image that specifies one target object or a specific portion of the target object in the image.

In one configuration example of the image input device of the present invention, the mark image is an image that specifies a plurality of objects or specific portions of the objects in the image.

In one configuration example of the image input device of the present invention, the mark superimposing display means has a mark image moving means for moving a display position of a mark image.

In one configuration example of the image input device according to the present invention, the mark superimposing display means has a mark image adjusting means for adjusting a size of a mark image.

In one configuration example of the image input device of the present invention, the mark superimposing display means has a mark image color changing means for changing a color of a mark image.

In one configuration example of the image input device according to the present invention, the mark superimposing display means has a mark image luminance adjusting means for adjusting the luminance of a mark image.

Further, in one configuration example of the image input device of the present invention, the mark superimposed display means, The display device further includes a storage unit for storing the description of the mark image, and a mark image description display unit for displaying the stored description of the mark image on the display unit when the mark image is superimposed.

Further, one configuration example of the image input device of the present invention further includes an image pickup unit that images an object.

Further, one configuration example of the image input device of the present invention is characterized in that the imaging means and the display means are not housed in one housing.

Also, one configuration example of the image input device of the present invention is a configuration in which instructing means for instructing storage of an image displayed on the display means, and image storage for storing the image based on the instruction of the instructing means. Means.

In one configuration example of the image input device of the present invention, the mark superimposing display means includes a mark image type information storing means storing mark image type information for identifying the mark image corresponding to the mark image. When an image is stored based on the instruction of the instruction unit, the image storage unit is configured to store the captured image, mark image type information of a mark image used at the time of imaging, and force in the image storage unit. Is what it is.

In one configuration example of the image input device of the present invention, the mark superimposing display means includes mark image display position information that stores mark display position information, which is information of a display position of the mark image, corresponding to the mark image. When the image is stored based on an instruction of the instruction means, a captured image and mark image display position information of a mark image used at the time of imaging are stored in the image storage means. It is configured as follows.

Further, in one configuration example of the image input device of the present invention, when an image is stored based on an instruction of the instruction means, an image in which a mark image is superimposed on a captured image is stored in the image storage means. It is configured so that:

Further, in one configuration example of the image input device of the present invention, when an image is stored based on an instruction of the instruction means, a captured image and a mark image used for image capturing are separately stored in the image input device. It is configured to be stored in the storage means.

Further, in one configuration example of the image input device of the present invention, the instructing unit may be configured to shoot It is configured to detect the stillness of the image of the object to be processed and to instruct the storage of the image when the stillness is detected.

Further, one configuration example of the image input device of the present invention is characterized in that the image storage means is provided in a remote place where data can be transmitted and received with the image input device.

[0015] One configuration example of the image input device of the present invention further includes an image recognition unit that analyzes an image stored in the image storage unit and performs a target object recognition process. In one configuration example of the image input device of the present invention, the image recognition means is provided in a remote place where data can be transmitted and received with the image input device.

In one configuration example of the image input device of the present invention, the image recognizing means refers to the mark image type information to determine a target object of the image to be analyzed or a type of a specific part of the target object. It is configured to identify and perform image analysis processing.

Further, in one configuration example of the image input device of the present invention, the image recognition means refers to the mark display position information to determine a position of an object or a specific portion of the object to be analyzed. It is configured to identify and perform image analysis processing.

In one configuration example of the image input device of the present invention, the image recognition unit recognizes the mark image from an image on which the mark image is superimposed, thereby detecting an object or an object of an image to be analyzed. It is configured to specify a specific part of an object and perform image analysis processing.

Further, in one configuration example of the image input device of the present invention, the image recognition means compares a recorded image with a mark image to specify a specific part of an object or an object of an image to be analyzed. Further, the present invention is a robot equipped with an image input device.

Further, the present invention is an image input program for causing a computer to function as an image input device, comprising: a display step of displaying an image of an object to be photographed; and a display step of superimposing the image on the displayed image of the object. And a mark superimposing display step of displaying a mark image indicating a position where the target object or a specific part of the target object is to be arranged. In one configuration example of the image input program according to the present invention, the image of the target object is an image input for recognizing the target object by image analysis.

Further, in one configuration example of the image input program of the present invention, the mark superimposing display step includes specifying an object or an object to be recognized from a plurality of mark images stored in the storage means. A selection step of selecting a mark image suitable for the portion; and a superimposition display step of superimposing and displaying the mark image selected by the selection step on the image of the target object.

In one configuration example of the image input program of the present invention, the mark image is an image that specifies the location of a target object or a specific portion of the target object with one index image. In one configuration example of the image input program according to the present invention, the mark image is an image that specifies an arrangement of a target or a specific portion of the target with a plurality of index images. In one configuration example of the image input program of the present invention, the mark image is an image that specifies one target object or a specific portion of the target object in the image.

In one configuration example of the image input program according to the present invention, the mark image is an image that specifies a plurality of objects or specific portions of the objects in the image.

In one configuration example of the image input program of the present invention, the mark superimposing display step may include a mark image moving step of moving a display position of a mark image.

In one configuration example of the image input program according to the present invention, the mark superimposing display step includes a mark image adjusting step of adjusting a size of a mark image. In one configuration example of the image input program according to the present invention, the mark superimposing display step includes a mark image color changing step of changing a color of the mark image. In one configuration example of the image input program of the present invention, the mark superimposing display step includes a mark image luminance adjusting step of adjusting the luminance of the mark image.

In one configuration example of the image input program of the present invention, when the mark superimposing display step superimposes and displays the mark image, the description of the mark image stored in the storage means is also displayed on the display means. Mark image description display step is provided. One example of the configuration of the image input program according to the present invention includes an instruction step for giving an instruction to store an image displayed on the display means, and storing the image in the image storage means based on the instruction in the instruction step. And the step of performing.

One example of the configuration of the image input program according to the present invention is a mark image for identifying a photographed image and a mark image used for photographing when the image is stored based on the instruction in the instruction step. A step of storing the type information in the image storage means.In one embodiment of the image input program of the present invention, when an image is stored based on the instruction in the instruction step, A step of storing, in the image storage means, a photographed image and mark display position information which is information on a display position of a mark image used at the time of photographing.

Also, one configuration example of the image input program of the present invention is such that, when an image is stored based on an instruction in the instruction step, an image in which a mark image is superimposed on a captured image is stored in the image storage unit. And a step of storing.

Also, one configuration example of the image input program of the present invention is such that when an image is stored based on the instruction in the instruction step, the captured image and the mark image used at the time of the imaging are separately stored in the image. And storing the data in a storage means.

Further, in one configuration example of the image input program of the present invention, the instruction step includes a step of detecting a stillness of an image of an object to be captured, and instructing storage of the image when the stillness is detected. It is provided.

Further, one example of the configuration of the image input program of the present invention further comprises an image recognition step of analyzing an image stored in the image storage means and performing a recognition process of a target or a specific portion of the target. It is.

In one configuration example of the image input program of the present invention, the image recognition step

And a step of referring to the mark image type information to specify a type of a target object or a specific portion of the target object of the image to be analyzed, and performing an image analysis process. In one configuration example of the image input program of the present invention, the image recognition step A step of identifying the position of an object or a specific portion of the object to be analyzed with reference to the mark display position information, and performing an image analysis process. In one configuration example of the image input program of the present invention, the image recognition step

A step of recognizing the mark image from the image on which the mark image is superimposed, specifying an object or a specific portion of the object to be analyzed, and performing an image analysis process. It is.

And comparing the recorded image with the mark image to identify a target object or a specific part of the target object of the image to be analyzed, and perform an image analysis process.

The invention's effect

According to the present invention, when photographing an object, the object and a specific portion of the object can be photographed and recorded at an appropriate position and size.

[0020] Further, at the time of image recording, information such as the position of the mark image and the meaning of the mark is also recorded, and the information is used when analyzing the image and recognizing the target or a specific part of the target. By doing so, it is possible to extract an object with high accuracy and a specific part of the object, greatly reduce the time required to detect the object and the specific part of the object, and recognize the object. Accuracy can be greatly improved.

Brief Description of Drawings

FIG. 1 is a block diagram showing a configuration of an image input device according to a first embodiment of the present invention.

FIG. 2 is a diagram illustrating an example of a mark image according to the first embodiment of the present invention.

FIG. 3 is a block diagram showing a configuration of a mark image generating unit according to the first embodiment of the present invention.

FIG. 4 is a block diagram showing another configuration of the mark image generating means in the first embodiment of the present invention.

FIG. 5 is a block diagram showing another configuration of the mark image generating means in the first embodiment of the present invention. FIG. 6 is a diagram showing another example of a mark image in the first embodiment of the present invention.

FIG. 7 is a block diagram illustrating a configuration of an image superimposing unit according to the first embodiment of the present invention.

FIG. 8 is a block diagram showing a configuration of an instruction unit according to the first embodiment of the present invention.

FIG. 9 is a diagram showing an example of an operation in the first example of the present invention.

FIG. 10 is a diagram showing an example of another operation in the first example of the present invention.

FIG. 11 is a diagram showing an example of a mark image according to a second embodiment of the present invention.

FIG. 12 is a diagram showing another example of a mark image according to the second embodiment of the present invention.

FIG. 13 is a diagram showing another example of a mark image according to the second embodiment of the present invention.

FIG. 14 is a diagram showing another example of a mark image according to the second embodiment of the present invention.

FIG. 15 is a diagram showing another example of a mark image according to the second embodiment of the present invention.

FIG. 16 is a diagram showing another example of a mark image according to the second embodiment of the present invention.

FIG. 17 is a diagram showing another example of a mark image in the second embodiment of the present invention.

FIG. 18 is a diagram showing another example of a mark image according to the second embodiment of the present invention.

FIG. 19 is a diagram showing another example of a mark image according to the second embodiment of the present invention.

FIG. 20 is a diagram showing another example of a mark image according to the second embodiment of the present invention.

FIG. 21 is a block diagram showing a configuration of a mark image generating unit according to a second embodiment of the present invention.

FIG. 22 is a block diagram showing a configuration of an image input device according to a third embodiment of the present invention.

FIG. 23 is a block diagram showing a configuration of an image input device according to a fourth embodiment of the present invention.

FIG. 24 is a block diagram showing a first mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 25 is a diagram for explaining a first mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 26 is a diagram for explaining a first mode of the image recognition means in the fourth embodiment of the present invention. FIG. 27 is a view for explaining a first mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 28 is a diagram for explaining a first mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 29 is a block diagram showing a second mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 30 is a view for explaining a second mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 31 is a block diagram showing a third mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 32 is a diagram for describing a third mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 33 is a diagram for explaining a third mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 34 is a block diagram showing another configuration of the third mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 35 is a diagram for explaining another configuration of the third mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 36 is a block diagram showing a fourth mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 37 is a view for explaining a fourth mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 38 is a block diagram showing a fifth mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 39 is a view for explaining a fifth mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 40 is a block diagram showing a sixth mode of the image recognition means in the fourth embodiment of the present invention. FIG. 41 is a diagram for describing a sixth mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 42 is a block diagram showing a seventh mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 43 is a diagram for explaining a seventh mode of the image recognition means in the fourth embodiment of the present invention.

FIG. 44 is a block diagram showing a configuration of a computer according to a fifth embodiment of the present invention.

FIG. 45 is a diagram showing a robot according to a sixth embodiment of the present invention.

BEST MODE FOR CARRYING OUT THE INVENTION

[First Example]

A first embodiment of the present invention will be described. FIG. 1 is a block diagram of the image input device according to the first embodiment. The image input device in the first embodiment should be provided with an image input means 1 for photographing an object to be recognized as shown in FIG. 1, and an object to be recognized or a specific portion of the object. The mark image generating means 2 for generating the mark image indicating the position, the image obtained from the image input means 1 and the mark image obtained from the mark image generating means 2 are combined, and the mark image is superimposed on the input image. Image superimposing means 3 for generating a captured image, display means 4 for displaying an image generated by the image superimposing means 3 and indicating a photographed area (imaging area), and instructing means for instructing recording of the image 5 and an image storage means 6 for receiving an image recording instruction from the instruction means 5 and recording an image from the image input means 1. The mark image generating means 2 and the image superimposing means 3 constitute a mark superimposing display means.

Image input means (imaging means) 1 is for taking an image of an object (subject) to be recognized and inputting an image of the object. It is a configured camera.

[0024] The mark image generating means 2 generates a mark image indicating the position of a target object to be recognized or a specific portion of the target object to be photographed by the image input means 1 which should be arranged and photographed. The mark image is generated by the image superimposing means 3 by using the image of the object. It is displayed on the display means 4 in a form superimposed on. The shape and size of the mark image, the position of the mark image on the display means 4, and the like differ depending on the target object to be recognized, the type of a specific portion of the target object, and the shooting mode (angle at the time of shooting, etc.).

For example, when the object to be recognized is a human face, both eyes of a human can be considered as a specific part for recognizing the face. For that purpose, it is necessary to take pictures so that the positions of both eyes can be understood.

Therefore, the mark image generating means 2 is a mark composed of an index image (cross in FIG. 2) indicating the positions of the human eyes so that the human eyes are photographed at ideal positions and sizes. Generate an image. Here, the index image may be any as long as it shows the position where the target object or a specific part of the target object is arranged.In the example of FIG. 2, the cross-shaped index image is used. For example, a shape such as a circle, a solid line, a dotted line, and a rectangle may be used.

[0027] Further, if the mark image is composed of a plurality of index images connected by a single index image, the direction and the direction of the target object or a specific portion of the target object can be correctly arranged. For example, by using two index images as shown in FIG. 2 and arranging both eyes on the two index images, shooting and recording can be performed with the face facing forward.

In the following description, the whole image including one or more index images will be described as a mark image.

FIG. 3 shows an example of the mark image generating means 2. The means 4 for displaying a mark image as shown in FIG. 3 includes an image generating means 21 and a mark basic information holding means 22. The mark basic information holding means 22 constitutes a storage means, a mark image type information storage means and a mark image display position information storage means.

The mark basic information holding unit 22 holds basic information on the mark (image data of the mark and its coordinate value, for example, the position of the right eye and the coordinate value of the left eye in the display area of the display unit 4).

The image generating means 21 reads out basic information on the mark from the mark basic information holding means 22, generates a mark image, and outputs the mark image to the image superimposing means 3.

[0032] For the convenience of the user, the mark image generating means 2 is provided with a mark size (cross-shaped support). Size), the position, and the size of the outer shape may be changed. In this case, a mark position 'size changing means 23 for changing a mark size (cross size), a position and an outer size as shown in Fig. 4 is added to the mark image generating means 2. By adopting such a configuration, the mark can be changed to a size or the like that can be easily recognized by the user. The mark position / size changing means 23 constitutes mark image moving means and mark image adjusting means.

[0033] Further, a configuration may be adopted in which the color of the mark image can also be changed. In this case, as shown in FIG. 5, a mark color changing means 24 for changing the mark color is added to the mark image generating means 2. By adopting such a configuration, it is possible to make the color of the mark easily recognizable to the user even in backlight or at night. The mark color changing means 24 constitutes a mark image color changing means.

[0034] Further, it may be configured to display the description of the mark image. For example, as shown in FIG. 6, a description such as “Right eye” or “Left eye” or a description such as “Adjust to eye position” may be displayed together with the mark image. In this case, the mark basic information holding means 22 also holds information on the description of the mark image.

The image superimposing means 3 combines the image of the object from the image input means 1 and the mark image obtained by the mark image generating means 2 to form a mark image on the image of the object (input image). This is to generate a superimposed image. The image superimposing means 3 constitutes superimposing display means, mark image explanation displaying means, and mark image luminance adjusting means.

As a first example of the superposition method, a mark image priority type in which a mark image is prioritized over an input image can be considered. This method can be expressed as follows when the input image is f (X, y), the mark image is g (x, y), and the superimposed image is h (x, y).

[0037] [Equation 1] h (x, y) ii f (x, y) ifg (x, y) ii 0 · '' (i) h (x, y) = g (x, y) ifg (x , y) ≠ 0... (2)

[0038] That is, when the input image and the mark image are at the same display coordinates, the mark image is configured to be displayed preferentially.

[0039] As a second example of the superposition method, a mixed type that mixes luminance values of an input image and a mark image is used. Can be considered. This method can be expressed as follows when the input image is f (X, y), the mark image is g (x, y), and the superimposed image is h (x, y).

[0040] [Equation 2] h (, y) = axfix, y) + xg (x, y j. · · (3)

Here, β is a weighting constant. When a is β, the mark image is more visible than the input image, and when a> j3, the input image is more clear than the mark image. It becomes visible. Also, the configuration may be such that // 3 can be changed to any value and the superimposition ratio can be changed so that the mark image can be made transparent or translucent arbitrarily.

[0042] As a third example of the superposition method, a position-mixed type in which a mark image is represented by a broken line or a dotted line can be considered. This method can be expressed as follows when the input image is f (X, y), the mark image is g (x, y), and the superimposed image is h (x, y).

[Equation 3] h (x, y) = g (x, y) if (i (x, y) = 0)… ( ⁴ ) h (, y) = f (, y) else. (Five)

Here, i (x, y) is a condition such as a dotted line or a broken line regarding x and y.

[0045] The example of the method of superimposing the input image and the mark image as described above has been described. However, the present invention is not limited to this. For example, a combination of a luminance mixing type and a positional mixing type may be used.

Further, the image superimposing means 3 may include a mechanism that allows the user to turn on and off the mark image. In this case, the image superimposing means 3 has a superimposed image generating means 31 and a mark image on / off designating means 32 as shown in FIG. 7, and superimposes according to a mark image on / off instruction by the mark image on / off designating means 32. The image generating means 31 is configured to turn on or off the superimposition of the mark image.

[0047] The display means 4 displays the image generated by the image superimposing means 3.

Specific examples of the display means 4 are not limited to a liquid crystal display used in a digital camera, but may be a CRT monitor, a plasma display, or the like. Alternatively, an optical finder may be used. The instruction means 5 is for instructing image recording, and is, for example, a shutter of a camera. The instruction to record an image is made by pressing a shutter or by a voice instructed by a user.

Note that the instruction means 5 may be an instruction means based on a video processing technique in addition to the above-mentioned means such as a button such as a shutter in a camera, an instruction by voice, and a remotely operated switch.

For example, the instruction means 5 as shown in FIG. 8 includes a basic image storage means 51, an image comparison means 52, a holding time recording means 53, and an instruction determination means 54. Then, the image comparing means 52 compares the input image with the image (black or white image initially) stored in the basic image storing means 51 and finds that the difference is smaller than the roughness or the predetermined threshold. If it is large (not similar), the operation is performed such that the time of the holding time recording means 53 is set to 0 and the image of the basic image storage means 51 is replaced with the input image. When the image comparison result is smaller (similar) than the predetermined threshold value, the image in the basic image storage means 51 is retained and the time stored in the retention time recording means 53 is updated. To work.

[0051] The instruction determining means 54 issues an input instruction when the time stored in the holding and recording means 53 exceeds a predetermined threshold.

[0052] By adopting such a configuration, it is possible to perform an instruction input when the input image has not changed for a specific time, and the user can stop the camera for a specific time to thereby make the camera stand still. , Input can be performed without pressing the shutter.

[0053] In the above example, the force object described based on the face image is not a face, but can be applied to general objects such as flowers, cars, vinyl, and animals.

The image storage unit 6 records the image from the image input unit 1 according to the recording instruction of the instruction unit 5. The medium on which the image is recorded is a RAM, a flash memory, a hard disk, or the like.

Next, an operation in the above configuration will be described.

FIG. 9 is a diagram showing an example of the operation in the first embodiment.

In FIG. 9, an image of a subject (object) is input by the image input unit 1. On the other hand, a mark image indicating the position where the eye position of the subject should be placed on the display image is Output from the image generator 2.

[0058] The mark image and the input image from the image input means 1 are superimposed by the image superimposing means 3, and the superimposed image is displayed on the display means 4 and presented to the user.

The user can adjust the position of the pupil of the eye and the mark image in the superimposed image by correcting the zoom-in / out function of the image input unit 1 and the relative position between the image input unit 1 and the object. Adjust so that the positions overlap. Then, after the adjustment, the instruction means 5 instructs image input capture, and the input image at that time is recorded in the image storage means 6.

By operating as described above, the image recorded in the image storage means 6 becomes a video in which both eyes are shown at the mark positions, and the accuracy of extracting both eyes (specific portions) is greatly improved. In addition, the object recognition accuracy can be greatly improved.

FIG. 10 is a diagram showing an example of another operation in the first embodiment.

FIG. 10 shows an example in which a car is recognized as a target object, in which a tire is a specific part. Then, the mark image indicates a position where the tire of the specific part is to be arranged.

[0063] The image input means 1 first captures an image of a car from a diagonally right direction. The position of the tires is defined by the mark image, and the car is photographed from the side in such a way as to match it, and when the position of the mark image and the tire match, the image is recorded, so that the car is recorded in the appropriate position and direction. Image.

According to the present invention as described above, an image in which an object or a specific portion of the object is arranged at an ideal position and size in the image can be recorded. In a mark image composed of index images, the size, direction, and orientation of the target object or a specific part of the target object can be recorded in an image that is correctly defined in the image, so that the target object and the target object can be recorded. It is possible to greatly improve the accuracy of extracting a specific part, and further, it is possible to greatly improve the recognition accuracy of the target object and the specific part of the target object.

Note that the image storage means 6 may be provided in a remote server connected by a network, which is not necessarily provided inside the image input device. Since the image storage means 6 often requires a high processing capacity, such a configuration is good for a mobile phone or the like having a low processing capacity. [Second Example]

Next, a second embodiment of the present invention will be described. In the present embodiment, since the configuration of the entire image input device is the same as that of the first embodiment, the description will be made using the reference numerals in FIG.

In the first embodiment, the example of the mark image in the case where the object to be recognized is a human face and the specific portion is the human eyes is described. However, the present invention can be applied not only to a human face but also to a person, a car, a flower, and other general objects. Therefore, in a second embodiment, an example of a mark image according to the type of an object to be recognized and a specific portion of the object will be described. Hereinafter, an example of the mark image will be described.

The mark image shown in FIG. 11 adopts a shape that can provide the eye line and the position of the mouth as information to the user side.

[0068] The mark image shown in Fig. 12 regards the entire face as a specific portion and shows the outline of the face, so that the face image to be recognized has an appropriate size and position. I have.

The mark image shown in FIG. 13 is another example in which the entire face is regarded as a specific part.

, Has the shape of a circumscribed rectangle.

[0070] The mark image shown in Fig. 14 is an example of a mark image that can be adjusted to include the positional relationship of a plurality of objects. This is an example in which a simple image can be captured.

FIG. 15 is an example in which a detailed description of an object is used for a mark image. In FIG. 16, the body type of the horse can be used as it is as the recognition information, and by sending it to the recognition processing side described in the third embodiment described later, the recognition processing can be greatly reduced.

The mark image shown in FIG. 16 is an example using an expression reminiscent of an actual image, and is an example of a mark image that resembles an eye. As a result, the user can easily recognize what specific part the displayed mark image indicates.

The mark image shown in FIG. 17 defines the location of a person. It can also be applied to things like baseball and commemorative photography with multiple people.

The mark image shown in FIG. 18 is an example used for inputting a flower. Recognition processing can be greatly reduced by sending the position information of the center (where there is a sepal, etc.) and the petals as recognition information to the recognition processing side described in the third embodiment described later. The mark image shown in FIG. 19 is a mark image for recognizing a face position of a person in a landscape. When it is difficult to detect a person in the scenery and the detection is difficult, by sending the information on the face position to the recognition processing side described in the third embodiment, the face detection operation can be greatly reduced.

The size of the mark image shown in FIG. 20 cannot be fixed by specifying one point, but if the object to be recognized (specific part) is a small one such as a ring, this may be sufficient. Many.

As described above, in the mark image, the size of the target object or the specific part of the target object in the image is normalized by designating the position of the target object or the specific part of the target object. Alternatively, the detection range may be fixed by indicating one point. In addition, by indicating the parts of a plurality of objects, it is possible to specify a plurality of objects from the video.

[0078] The mark images described above may be of a single type and may be defined as a rough rule, but a plurality of mark images may be prepared in the mark image generating means 2 and the user may select a menu in advance. The mark can be selected by selecting from among them. In this case, a mark selection means 25 using a menu as shown in FIG. 21 is provided in the mark image generation means 2.

[0079] Further, the mark image may be in a format that allows a user to newly create and register the position, location, size, and the like.

[Third Example]

Next, a third embodiment of the present invention will be described.

In the first and second embodiments described above, a case where an image in which a target object or a specific portion of the target object is arranged at an appropriate size and position is recorded by a mark image has been described. In addition to the recorded image data, the position information of the mark image, the information on the meaning of the mark image, and the like are also recorded, and if such information is used in the image recognition processing, the image The accuracy of recognition can be increased and the processing load can be reduced. Therefore, in the third embodiment, an example will be described in which not only image data of a target object but also information of a mark image and the like are simultaneously recorded.

FIG. 22 is a block diagram of the third embodiment. In addition, about the thing similar to 1st Example, The same reference numerals are given and detailed description is omitted. The image storage means 6a has the following functions in addition to the functions described in the first and second embodiments.

The difference from the first embodiment and the second embodiment is that the position information (the coordinate position of both eyes in the example) of the mark image recorded in the mark basic information holding means 22 of the mark image generating means 2 is recorded. Is input to the image storage means 6a. When an instruction to record an image is issued by the instruction means 5, the position information of the mark image recorded in the mark basic information holding means 22 of the mark image generation means 2 (the coordinate position of both eyes in the example) is obtained. Then, it is recorded together with the image data in the image storage means 6a.

[0084] In this way, at the time of image recognition processing, it is possible to accurately extract a target object or a specific portion of the target object based on the position information of the mark image, thereby improving the accuracy of the recognition processing and processing. The burden can be reduced.

Further, not only the position information of the mark image but also the type of the mark image (for example, a mark image for arranging eyes) may be recorded. For example, when the mark image used at the time of photographing is a cross point where the position of the human eye is placed), the mark image used is the mark image of the human eye, and the mark image If the position information is recorded, the specific part of the object is the eye during the recognition process, and the position of the eye in the recorded image can be known, so that the accuracy of the recognition process is improved and the processing load is large. Can be reduced.

[0086] As described above, by recording an object, an object that is located only at the position of a specific portion of the object, and information about the specific portion of the object together with the image of the object, The accuracy of the recognition process is further improved, and the processing load can be greatly reduced.

[0087] Further, when there are a plurality of mark images or when it is desired to generalize the recognition engine, mark information is represented in the form of metadata such as an input video and the type and coordinate value of the mark image. , May be stored in the image storage means 6a.

Further, the input video and the mark video may be separately stored in the image storage unit 6a.

[Fourth embodiment]

Next, a fourth embodiment of the present invention will be described.

In the first to third embodiments described above, the configuration is such that the recording of the image to be recognized is performed. However, in the fourth embodiment, in addition to the first to third embodiments, Of recorded images Performs up to recognition processing.

FIG. 23 is a block diagram showing a fourth embodiment.

[0092] In the fourth embodiment, an image recognition means 7 is provided in addition to the configuration of the first to third embodiments. The image recognition means 7 performs a process of analyzing an image and recognizing a target portion based on the image data recorded in the image storage means 6 or 6a.

Hereinafter, the mode of the image recognition means 7 will be described in detail. In the following description, when an image is analyzed to recognize an object or a specific part of the object, the semantic information (mark image type information) of the mark image described in the third embodiment is used together with the image data of the object. ) And position information are recorded.

First, the first mode of the image recognition means 7 will be described.

FIG. 24 is a block diagram of the image recognition means 7 according to the first embodiment.

The image recognition means 7 as shown in FIG. 24 includes a recognition template storage means 71, a position matching means 72, and a similarity calculation means 73.

[0097] In the recognition template storage means 71, as shown in Fig. 25, a template in which the face of each person is photographed for collation with the input image is stored. At the time of storing these recognition templates, the positions of both eyes in the recognition template are manually or automatically extracted from the input device proposed by the present invention.

[0098] In addition, since the shapes of faces are all different between persons, the positions of eyes and the like are not the same for each person. If the images are simply superimposed, different parts of the face will be aligned as shown in Figure 26. Therefore, the position matching unit 72 matches the image stored in the image storage unit 6 or 6a with the template stored in the recognition template storage unit 71, and matches the mark image of the recorded image. Based on the position information and the position information of the template, one of the images is subjected to an affine transformation (conversion to correct the enlargement / reduction, rotation, and position), and as shown in FIG. An operation is performed so that the positions of the parts match.

[0099] The similarity calculation means 73 compares the pixel values of each template with the image recorded in the image storage means 6 or 6a in a state where the positions of the element parts are coincident. Recognition processing is performed by comparing the values of the feature amounts of.

[0100] Depending on the state of the recognition target, it may be considered that a specific portion is not completely displayed. Can be For example, as shown in FIG. 28, there is a case where sunglasses are worn. In such a case, it is difficult to automatically detect the position of the eye, but the image recorded in the image storage means 6 includes the position information of the eye, that is, the position information of the mark image, and Accurate positioning can be performed.

[0101] As described above, the position at which the target object and the specific part of the target object are arranged is known, so that the recognition accuracy of the target object and the specific part of the target object can be significantly improved.

Next, a second mode of the image recognition means 7 will be described.

[0103] In the recognition process, in the case of a face or the like, the accuracy can be improved by not performing the similarity evaluation for a part that largely changes due to facial expressions or the like. For example, when a facial expression of a person is photographed, the area around the mouth is largely changed by the facial expression. If the similarity is evaluated equally without taking such changes into account, the accuracy of the similarity will deviate. In such a case, the problem is solved by using only the stable part without using the part around the mouth when calculating the similarity. Therefore, the image recognition means 7 as shown in FIG. 29 is provided with a similarity calculation use section setting means 74 instead of the position matching means 72.

For example, as shown in FIG. 30, when the recorded image has a drastic change due to the facial expression around the mouth, the similarity calculation use section setting means 74 sets the area to the area around the mouth and the designated part. By setting, and not performing the similarity evaluation, it is possible to realize an image analysis that eliminates the influence caused by the fluctuation of the facial expression.

Next, a third mode of the image recognition means 7 will be described.

The present embodiment is a method that utilizes the sum of similarities for each part. An example of the image recognition means 7 for realizing this is shown in FIG.

The image recognition means 7 as shown in FIG. 31 includes a recognition part template storage means 80, a part similarity calculation means 81, a part extraction means 82, and an overall similarity derivation means 83.

FIG. 32 shows an image recorded in the image storage means 6 or 6a and a recognition template stored in the recognition template storage means 80. The recognition template is Stored in

[0109] The part extraction means 82 is a part of the image stored in the image storage means 6 or 6a. Based on the position information, an image of a specific part of the object, for example, an image of a specific part such as a left eye, a right eye, and a mouth is extracted from the image. Then, the extracted specific part and the template of each part are compared by the part similarity calculating means 81 as shown in FIG. Then, the overall similarity deriving means 83 determines the total similarity of each part calculated by the part similarity calculating means 81, thereby defining the overall similarity.

[0110] This method can be applied even when a part of the face is hidden, such as sunglasses. An example of such an application is shown in FIG. The image recognition means 7 as shown in FIG. 34 further includes high similarity selection means 84 in addition to the configuration of FIG.

[0111] Also in such a configuration, since the position of the eye is specified in advance by the position information of the mark image, the part extracting means 82 can extract the position of the sunglasses as the right eye and the left eye, and Prior to similarity derivation, the high similarity portion selecting means 84 selects a high similarity portion (in this example, a mouth) as shown in FIG. Derivation of similarity becomes possible.

Next, a fourth mode of the image recognition means 7 will be described.

[0113] In the fourth mode, target three-dimensional information is used for recognition. It holds three-dimensional information for the recognition target, estimates three-dimensional information of the image based on information of a specific part, creates an image from the three-dimensional data for recognition, and performs recognition.

FIG. 36 is a block diagram of the image recognition means 7 according to the fourth mode.

[0115] The image recognition means 7 includes a three-dimensional face information storage means 90, a face direction estimating means 91, a face direction matching image generating means 92, and a similarity calculating means 93.

[0116] The three-dimensional face information storage means 90 stores three-dimensional face information. Then, the face direction estimating means 91 estimates the face direction angle of the object based on the image (input image of the object) recorded in the image storage means 6 or 6a and the position information of the mark image. For example, as shown in FIG. 37, if the position information is a positional relationship between the eyes and the nose (position information of the mark image), the face direction angle of the target is estimated from these.

[0117] The face orientation matching image generation means 92 stores a face image that matches the face orientation angle estimated by the face orientation estimation means 91 in the three-dimensional face information storage means 90. Create face information. [0118] The similarity calculating means 93 calculates the similarity between the image recorded in the image storage means 6 or 6a and the face image generated by the face orientation matching image generating means 92 and having the same face direction. Measure and perform recognition processing.

Next, a fifth mode of the image recognition means 7 will be described.

[0120] The fifth mode is a case where three-dimensional information is applied to an image recorded in the image storage means 6 or 6a. For example, for a face, a general standard face is created, and is mapped onto a three-dimensional standard face image in accordance with information on a part of the image recorded in the image recording means 6. After mapping to a three-dimensional standard face image, the image is rotated to create a pseudo front image, and the recognition face information and template matching are performed.

FIG. 38 is an example of a block diagram of such an image recognition means 7.

The image recognition means 7 has a three-dimensional standard face image mapping means 100, a front face generation means 101, a similarity calculation means 102, and a recognition template storage means 103.

[0123] The three-dimensional standard face image mapping means 100 maps the recognition face image onto the three-dimensional standard face using information of each part (mark position information) as shown in FIG. The frontal face generating means 101 generates a pseudo frontal face from the mapped three-dimensional information. Then, the similarity calculating means 102 calculates the similarity between the generated pseudo frontal face and the recognition template storing the frontal face stored in the recognition template storage means 103, thereby calculating the similarity. Calculate and identify the person.

In the above example, it is needless to say that the force object created based on the face image is not a face, but can be applied to general objects such as flowers, cars, vinyls, and animals. .

Next, a sixth mode of the image recognition means 7 will be described.

[0126] In the first to fifth aspects, as information for recognizing a target or a specific part of the target, information on the meaning of a mark image recorded at the same time and position information are used. In the sixth mode, unlike the first to fifth modes, a case will be described in which recognition processing of a target or a specific part of the target is performed using an image of the target on which the mark image is superimposed.

FIG. 40 is a block diagram of the image recognition means 7 according to the sixth mode.

The image recognition means 7 includes a mark information extraction means 110, a feature quantity derivation means 111, a feature quantity calculation means 112, and a recognition feature quantity storage means 113. [0129] The mark extracting means 110 specifies a mark image superimposed on the image for recognition based on the color of the mark image or the like. The feature amount deriving unit 111 derives a feature amount for a pixel inside or in the vicinity of the mark specified by the mark extracting unit 110. Then, the feature amount calculating unit 112 collates the derived feature amount with the feature amount recorded in the recognition feature amount storage unit 113 to perform recognition.

FIG. 41 shows an example in which the recorded image is a flower image, and the mark image is superimposed on the outer ring. The mark is specified by the mark extracting means 110 based on the color of the mark image and the like, and the feature amount deriving means 111 creates a histogram of the color of the flower inside the mark. By comparing the histogram with the color histogram in the database by the feature amount calculating means 112, the flower is specified.

[0131] By using mark information (for example, mark color information), it is easy to extract and use color information on petals without adding pixels of stems and leaves as elements of a histogram. Can be.

Next, a seventh embodiment of the image recognition means 7 will be described.

The seventh mode is an example of a recognition process in a case where an input video and a mark video are separately stored in the image storage means 6 or 6a.

FIG. 42 is a block diagram of the image recognition means 7 of the seventh embodiment.

As shown in FIG. 42, the image recognizing means 7 includes an object extracting means 120 and a similarity calculating means 12

1 and a recognition template storage unit 122.

[0136] The object extracting means 120 performs a product (AND) process of the mark image and the input image (recognition image) separately recorded as shown in FIG. Extract.

The similarity calculating means 121 performs recognition by comparing the extracted feature amount of each part with the feature amount of each part stored in the recognition template storage means.

[0138] The first to seventh aspects of the image recognition means 7 have been described above. However, the present invention is not limited thereto, and may be combined as appropriate.

In the above description, position information of the mark image, information such as the meaning of the mark image, and the like are used at the time of image analysis. Target, at the appropriate position and size, depending on the mark image Since a specific part of an object or a target object is photographed, it is possible to perform image analysis without using information such as the position and meaning of the mark image.

The image recognition means 7 may be provided in a remote server connected via a network, which is not necessarily provided inside the image input device. Since the image recognition means 7 often requires a high processing capacity, such a configuration is good for a mobile phone or the like having a low processing capacity.

[0141] [Fifth embodiment]

Next, a fifth embodiment of the present invention will be described.

[0142] The processing of the means in the first to fourth embodiments described above can be executed by a program. Thus, in a fifth embodiment, an example will be described in which the processing of the means in the first to fourth embodiments described above is executed by a program.

FIG. 44 is a block diagram showing a configuration of a computer according to the fifth embodiment.

In the case of executing the program, a program memory 50 storing an execution program and the like, instead of the mark image generating means 2, the image superimposing means 3 and the image recording means 6, a mark image, mark information, etc. An information memory 51 storing the information and a microprocessor 52 for executing each process by a program are provided.

[0145] The microprocessor 52 creates a mark image based on the information read from the information memory 51, and superimposes the mark image on the image input from the image input device 1, and displays the mark image. Then, according to an instruction from the instruction means 5, after the image is recorded, the object is recognized by the image recognition program and the recognition dictionary memory.

[0146] Although the first to fifth embodiments have been described above, examples of a system to which the image input device of the present invention can be applied include a camera, a display device, and an arithmetic device, such as a mobile phone with a camera. Can be considered as one. The same applies to video cameras with arithmetic units, PDAs, digital cameras, etc.

[0147] Further, the present invention is applicable even if the image input means 1 and the display means 4 are different. For example

However, it does not need to be integrated with the camera and display.

[0148] Further, the image of the image input unit 1 (camera) is freely adjusted by remote control, and the image is adjusted by projecting the image on the display unit 4 (screen or large-screen screen) at hand. It can be in the form.

[Sixth Embodiment]

Next, a sixth embodiment of the present invention will be described.

The sixth embodiment is a case where the image input device of the present invention is applied to a robot.

As an example of a robot that can be applied, any robot that has an image input unit 1 such as a camera and a video camera and a display unit 4 that externally displays an input image can be applied. If the mark image generating means 2, the image superimposing means 3, the image storing means 6 or 6a, and the image recognizing means 7 are provided inside the robot, the same processing as in the above-described first to fifth embodiments can be performed. It is possible for the robot to perform the operation. FIG. 45 shows a robot to which the image input device of the present invention is applied.

The robot shown in FIG. 45 has image input means 1 for inputting an image from a camera or the like to the eye. In addition, the apparatus has a display unit 4 for externally displaying an image input to the abdomen of the robot, and a mark image indicating a position where a specific portion of the input image should be located when the image is input is displayed on the display unit 4. .

[0153] In the example of Fig. 45, the target object is a human face, and the mark image indicating the position where the eye should be located is displayed.

Are displayed on the display of the display means 4. The user (subject) adjusts the positional relationship with the robot so that his or her eyes overlap the mark image.

[0154] At this time, the subject itself may move, or an instruction to change the position of the robot or the state (zoom or the like) of the image input means 1 (camera) may be given.

When the mark and both eyes overlap, an instruction to input an input image is issued. The instruction can be a human voice or a remotely operated switch.

[0156] The robot can perform image recognition by using the internal image recognition means 7 assuming that a specific portion of the subject is shown at a predetermined position in the recorded image, and can specify a person.

[0157] Thus, a robot to which the image input device of the present invention is applied can greatly improve the image recognition ability.

In the present embodiment, the position of the image input means 1 (camera) is provided at the position of the eyes of the robot, and the position of the display means 4 (display) is provided at the abdomen of the robot. The place is free. In particular, the display is not on the robot housing but on a separate monitor. It doesn't matter.

[0159] In order to adjust the positional relationship between the robot and the subject, commands such as "advance forward", "rear down", and "zoom up" are given to the robot by voice or commands such as the remote control. You may instruct.

[0160] The object may be a person other than a person, such as frame information of a painting for "What is this picture?" Or a specific part of a frame for "What is this toy?"

Industrial applicability

The present invention can be applied to an image input device.

Claims

The scope of the claims

[1] display means for displaying an image of an object to be photographed,

An image input device, comprising: mark superimposing display means for displaying, on the display means, a mark image indicating a position at which a target object or a specific portion of the target object is arranged, being superimposed on the image of the target object.

[2] The image input device according to claim 1,

The image input device, wherein the image of the target object is an image input for recognizing the target object by image analysis.

[3] The image input device according to claim 1,

The mark superimposing display means,

Storage means for storing a plurality of mark images corresponding to an object to be recognized or a specific portion of the object;

Selecting means for selecting a mark image suitable for an object to be recognized or a specific part of the object from a plurality of mark images stored in the storage means;

An image input device, comprising: a superimposition display unit that superimposes and displays the mark image selected by the selection unit on the image of the target object.

[4] The image input device according to claim 1,

The image input device according to claim 1, wherein the mark image is an image that specifies an arrangement of a target object or a specific portion of the target object with one index image.

[5] The image input device according to claim 1,

The image input device according to claim 1, wherein the mark image is an image that specifies an arrangement of a target object or a specific portion of the target object using a plurality of index images.

[6] The image input device according to claim 1,

The image input device according to claim 1, wherein the mark image is an image for specifying one target object or a specific portion of the target object in the image.

[7] The image input device according to claim 1,

The image input device according to claim 1, wherein the mark image is an image for identifying a plurality of objects or specific portions of the objects in the image.

[8] The image input device according to claim 1,

The image input device, wherein the mark superimposing display means includes mark image moving means for moving a display position of a mark image.

[9] The image input device according to claim 1,

The image input device, wherein the mark superimposing display means includes a mark image adjusting means for adjusting a size of a mark image.

[10] The image input device according to claim 1,

The image input device, wherein the mark superimposed display means includes a mark image color changing means for changing a color of a mark image.

[11] The image input device according to claim 1,

The image input device, characterized in that the mark superimposing display means has a mark image brightness adjusting means for adjusting the brightness of the mark image.

[12] The image input device according to claim 1,

The mark superimposing display means,

Storage means for storing a description of the mark image;

An image input apparatus, further comprising: a mark image explanation display unit for displaying the stored mark image on the display unit when the mark image is superimposed.

[13] The image input device according to claim 1,

An image input device further comprising an image pickup means for picking up an image of an object.

[14] The image input device according to claim 13,

The image input device, wherein the imaging unit and the display unit are not housed in one housing.

[15] The image input device according to claim 1,

An image input apparatus, further comprising: instruction means for instructing storage of an image displayed on the display means; and image storage means for storing the image based on an instruction from the instruction means.

[16] The image input device according to claim 15, The mark superimposition display means has mark image type information storage means for storing mark image type information for identifying the mark image corresponding to the mark image,

When an image is stored based on the instruction of the instruction means, the captured image and mark image type information of the mark image used at the time of the imaging are configured to be stored in the image storage means. An image input device, characterized in that:

[17] The image input device according to claim 15,

The mark superimposition display means includes mark image display position information storage means for storing mark display position information, which is information on the display position of the mark image, corresponding to the mark image,

When an image is stored based on the instruction of the instruction means, the captured image and mark image display position information of the mark image used at the time of imaging are configured to be stored in the image storage means. An image input device, comprising:

[18] The image input device according to claim 15,

An image input apparatus, wherein an image in which a mark image is superimposed on a photographed image is stored in the image storage means when an image is stored based on an instruction of the instruction means.

[19] The image input device according to claim 15,

When an image is stored based on an instruction from the instruction means, a captured image and a mark image used at the time of imaging are configured to be separately stored in the image storage means. Image input device.

[20] The image input device according to claim 15,

The image input means is configured to detect a stillness of an image of an object to be photographed, and to issue an instruction to store an image when the stillness is detected.

[21] The image input device according to claim 15,

The image input device, wherein the image storage means is provided at a remote place where data can be transmitted and received to and from the image input device.

[22] The image input device according to claim 15, An image input apparatus, further comprising: an image recognizing unit that analyzes an image stored in the image storage unit and performs a target object recognizing process.

[23] The image input device according to claim 22, wherein

The image input device, wherein the image recognition means is provided at a remote place where data can be transmitted and received to and from the image input device.

[24] The image input device according to claim 22,

The image recognizing means is configured to refer to the mark image type information, specify a type of a target object or a specific portion of the target object of the image to be analyzed, and perform an image analysis process. An image input device characterized by the above-mentioned.

[25] The image input device according to claim 22,

The image recognizing means is configured to refer to the mark display position information to specify a position of an object or a specific portion of the object to be analyzed, and perform an image analysis process. An image input device characterized by the above-mentioned.

[26] The image input device according to claim 22,

The image recognizing means recognizes the mark image from the image on which the mark image is superimposed, specifies an object or a specific portion of the object to be analyzed, and performs an image analysis process. An image input device comprising:

[27] The image input device according to claim 22, wherein

The image recognizing means is configured to compare a recorded image with a mark image to specify an object or a specific portion of the object to be analyzed, and to perform an image analysis process. An image input device characterized in that:

[28] A robot equipped with the image input device according to claim 1.

[29] A robot equipped with the image input device according to claim 15.

[30] A robot equipped with the image input device according to claim 22.

[31] An image input program that causes a computer to function as an image input device,

A display step of displaying an image of an object to be shot;

A mark superimposing display step of displaying a mark image indicating a position where the target object or a specific part of the target object is to be arranged so as to be superimposed on the displayed image of the target object. An image input program characterized by being executed by a user.

[32] The image input program according to claim 31, wherein

The image input program, wherein the image of the object is an image input for recognizing the object by image analysis.

[33] The mark superimposed display step includes:

A selection step of selecting a mark image suitable for an object to be recognized or a specific portion of the object from a plurality of mark images stored in the storage means;

A superimposing display step of superimposing and displaying the mark image selected in the selecting step on the image of the target object.

[34] The image input program according to claim 31, wherein

The image input program according to claim 1, wherein the mark image is an image that specifies an arrangement of a target object or a specific portion of the target object with one index image.

[35] The image input program according to claim 31, wherein

The image input program according to claim 1, wherein the mark image is an image that specifies an arrangement of a target object or a specific portion of the target object by a plurality of index images.

[36] The image input program according to claim 31, wherein

The image input program according to claim 1, wherein the mark image is an image that specifies one target object or a specific portion of the target object in the image.

[37] The image input program according to claim 31, wherein

The image input program according to claim 1, wherein the mark image is an image for specifying a plurality of objects or specific portions of the objects in the image.

[38] The image input program according to claim 31, wherein

The image input program, wherein the mark superimposing display step includes a mark image moving step of moving a display position of the mark image.

[39] The image input program according to claim 31, wherein

The mark input superimposing step includes a mark image adjusting step for adjusting the size of the mark image.

[40] The image input program according to claim 31, wherein The image input program, wherein the mark superimposing display step includes a mark image color changing step of changing a color of the mark image.

[41] The image input program according to claim 31, wherein

An image input program, characterized in that the mark superimposing and displaying step includes a mark image luminance adjusting step of adjusting the luminance of the mark image.

[42] The image input program according to claim 31, wherein

The mark superimposing display step includes:

When the mark image is superimposed and displayed, the image input program is provided with a mark image explanation displaying step of also displaying the explanation of the mark image stored in the storage means on the display means.

[43] The image input program according to claim 31, wherein

An image input program, further comprising: an instruction step for giving an instruction to store an image displayed on the display means; and a step of storing the image in the image storage means based on the instruction in the instruction step. .

[44] The image input program according to claim 43,

When an image is stored based on the instruction in the instruction step, a step of storing a captured image and mark image type information for identifying a mark image used for shooting in the image storage unit is provided. An image input program characterized by the following.

[45] The image input program according to claim 43,

When an image is stored based on the instruction in the instruction step, a step of storing a captured image and mark display position information that is information on a display position of a mark image used at the time of image capturing in the image storage unit An image input program, characterized in that:

[46] The image input program according to claim 43,

When an image is stored based on the instruction in the instruction step, the image input program includes a step of storing an image in which a mark image is superimposed on a captured image in the image storage means.

[47] The image input program according to claim 43,

When an image is stored based on the instruction in the instruction step, a captured image and An image input program comprising a step of separately storing a used mark image in the image storage means.

[48] The image input program according to claim 43,

The image inputting step includes a step of detecting a stillness of an image of an object to be photographed, and instructing storage of the image when the stillness is detected.

[49] The image input program according to claim 43,

An image input program, further comprising an image recognition step of analyzing an image stored in the image storage means and performing a recognition process for a target object or a specific portion of the target object.

[50] The image input program according to claim 49, wherein

The image recognition step includes a step of referring to the mark image type information, specifying a type of an object to be analyzed or a specific portion of the object, and performing an image analysis process. Image input program to do.

[51] The image input program according to claim 49, wherein

The image recognition step includes a step of referring to the mark display position information, specifying a position of an object or a specific portion of the object to be analyzed, and performing an image analysis process. Image input program to do.

[52] The image input program according to claim 49, wherein

The image recognizing step is a step of recognizing the mark image from the image on which the mark image is superimposed, specifying an object or a specific portion of the object to be analyzed, and performing an image analysis process. Image input characterized by comprising:

[53] In the image input program according to claim 49,

The image recognition step includes a step of identifying an object or a specific portion of the object to be analyzed by comparing the recorded image with the mark image, and performing an image analysis process. Image input characterized by: