WO2011043060A1 - 追尾対象選択装置、方法、プログラム及び回路 - Google Patents
追尾対象選択装置、方法、プログラム及び回路 Download PDFInfo
- Publication number
- WO2011043060A1 WO2011043060A1 PCT/JP2010/005956 JP2010005956W WO2011043060A1 WO 2011043060 A1 WO2011043060 A1 WO 2011043060A1 JP 2010005956 W JP2010005956 W JP 2010005956W WO 2011043060 A1 WO2011043060 A1 WO 2011043060A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- tracking
- video
- unit
- tracking target
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/165—Detection; Localisation; Normalisation using facial parts and geometric relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
- G06V40/176—Dynamic expression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/61—Control of cameras or camera modules based on recognised objects
- H04N23/611—Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/62—Control of parameters via user interfaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
- G06T2207/20104—Interactive definition of region of interest [ROI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Definitions
- the present invention relates to a technique for accurately selecting a tracking target object from an image in the fields of a digital still camera, a digital video camera, a network camera, a security camera, and the like.
- these image pickup apparatuses are normally provided with a display such as an LCD, and the user can take an image of a subject while confirming recorded images on the display.
- a display such as an LCD
- the user can take an image of a subject while confirming recorded images on the display.
- human faces are detected, and auto-focus function (AF (Auto Focus)), automatic exposure function (AE (Auto Exposure)), etc. for the detected faces.
- AF Auto Focus
- AE Automatic Exposure
- imaging devices that can perform adjustment processing
- imaging devices that measure the smile level of a person and perform shutter control.
- control AF processing, AE processing, etc.
- AF processing AF processing, AE processing, etc.
- AE processing AE processing
- imaging devices that perform tracking and perform AF / AE control in accordance with the tracking.
- a tracking target object region is specified by a user's manual input (for example, touching a tracking target object region on a touch panel), and the specified tracking is performed. Extracts the color features of the target object region, tracks the extracted color features and other objects, and detects the tracking candidate object from the image, selects the detected tracking candidate object, and selects the selected tracking.
- a technique for setting a candidate object as a tracking object see, for example, Patent Document 1 and Patent Document 2.
- FIG. 21 is a block diagram of the prior art described in Patent Document 1.
- the tracking target object needs to be in a stationary state (or a state close to a stationary state).
- FIG. 2 is a diagram for explaining the problems of the prior art.
- the subject is often moving.
- a tracking target tilt target in tracking performed in AE or the like
- the user of the movie selects the monitor while watching a display such as an LCD (Liquid Crystal Display).
- the subject may move at the moment when the tracking target is selected. That is, for example, as the subject image 91, the first image 91a1 at the first position 91a1P is displayed at the first time before the movement, while the second image at the second position 91a2P is displayed at the second time. 2 video 92a2 may be displayed. That is, in this way, the image 91 may move (change in position) between the first position 91a1P and the second position 91a2P.
- an object at a position deviating from the user's intention for example, an object at the first position 91a1P at the second time when the position of the video 91 is the second position 91a2 (not shown)). Etc.), and tracking based on an incorrect selection is performed.
- Such movement of the video 91 may be caused by, for example, movement of a subject (see the subject 103x in FIG. 14) in the three-dimensional space (see the road 103R) captured on the video 91. .
- the image 91 may move even when the subject is almost stationary in the three-dimensional space (on the road 103R). That is, when setting the target selection, it is necessary to perform a button operation on the device or an operation on the touch panel. For this reason, the camera body may move due to these operations. As a result of the movement of the camera body, the positional relationship between the position of the subject (refer to the position of the subject 103x) and the position of the camera (refer to the position of the image sensor 103) is shifted.
- the position of the image 91 (FIG. 2) may change between a plurality of positions (positions 91a1P, 91a2P, etc.). That is, in this way, the movement of the camera may cause an erroneous setting.
- the tracking target is selected from one or more candidates detected as objects.
- the present invention solves the above-mentioned problem, and the operation (touch etc.) for selecting the tracking target can be simplified (simple), and the tracking target is not in a stationary state (video 91a or the like).
- Tracking target selection device and method for enabling easy selection even when the visibility is poor video 91b, video 91c, etc.
- an object is to provide a storage medium and the like.
- the first tracking target selection device is a tracking target selection device that selects a tracking target, and an input image (an input image (second input image) captured by a camera or the like).
- an input image an input image (second input image) captured by a camera or the like.
- a tracking unit that calculates an object candidate area, and an image of the tracking object candidate area calculated by the tracking unit (first included in the first input image) at a fixed position in the input image (first input image).
- a composition unit that synthesizes the image after the image is synthesized at the fixed position by the synthesis unit.
- the input image (after synthesis The first input image) and the combined image displayed by the display unit and displayed at the fixed position in the input image after being combined.
- an operation such as a touch operation
- the object for example, a person (face), a car, or the like
- a selection unit that selects the tracking target in tracking in AF processing (for example, AF processing).
- the target object (the object to be tracked) is moving when the user selects (from a plurality of objects) the object to be tracked (selected by the tracking target selection device), the target object is moving. Since the candidate (image synthesized image) is displayed at a fixed position, the target object intended by the user can be accurately selected.
- the second tracking target selection device has the above-described configuration and operation, and a feature extraction unit that extracts a predetermined feature from the image of the target object candidate region being tracked by the tracking unit; From the features extracted from the feature extraction unit, a predetermined state of the target object (direction (see video 91c in FIG. 2 and the like)) is calculated, and whether the calculated state is a predetermined state.
- a determination unit that determines whether or not the direction is a front-facing direction, and the determination unit determines that the calculated state of the target object candidate region is the predetermined state.
- a storage unit that stores the determined tracking target candidate area (an image of the tracking target candidate area).
- storing an area means storing an image of the area.
- the state of the target object candidate area (image of the area) (for example, the direction of the subject imaged in the area) is determined, stored in the storage unit, and stored in the target object candidate area (image of ) Is displayed at a fixed position. Therefore, when the user selects the target object, the video of the target object (the first video in the first input image described above) captured at the time when the user selects the target object is a video with poor visibility (in FIG. 2). Even if the user cannot determine whether the target object of the video is the target object intended by the user (it is difficult to determine), the user accurately selects the intended target object. It becomes possible to do.
- this device may be called a tracking device, a target selection device, or another name such as an object selection device, for example.
- a display control unit that controls the display may be configured by a part of the display unit that performs the display. That is, the display unit may control display by the display unit by the display control unit.
- the tracking target selection device of the present invention when the target object in the image is moving (see the column (A) in FIG. 2), the image of the target object is poorly visible (the target is small, inappropriate) Even if the user cannot (easily) recognize the target object (which is a plurality of objects) (see column (B)), for example, The target object intended by the user can be selected.
- the operation can be easily performed, the degree of simplicity can be sufficiently improved, and the position of the image to be synthesized (“the other video” in this document) can be surely appropriate.
- FIG. 1 is a block diagram of a tracking target selection device according to an embodiment of the present invention.
- FIG. 2 is a diagram for explaining the problems of the prior art.
- FIG. 3 is a functional block diagram showing a functional configuration of the tracking target selection device according to Embodiment 1 of the present invention.
- FIG. 4 is a flowchart up to display processing showing the functional configuration of the tracking target selection device according to Embodiment 1 of the present invention.
- FIG. 5 is a flowchart of the tracking target selection process showing the functional configuration of the tracking target selection device according to Embodiment 1 of the present invention.
- FIG. 6 is a flowchart of the tracking unit in the first embodiment of the present invention.
- FIG. 7 is a diagram for explaining the tracking unit according to the first embodiment of the present invention.
- FIG. 1 is a block diagram of a tracking target selection device according to an embodiment of the present invention.
- FIG. 2 is a diagram for explaining the problems of the prior art.
- FIG. 3 is a functional block
- FIG. 8 is a diagram for explaining object orientation calculation in the feature extraction unit according to the first embodiment of the present invention.
- FIG. 9 is a diagram for explaining calculation of face center coordinates in the first embodiment of the present invention.
- FIG. 10 is a diagram for explaining calculation of the nose position coordinates in the first embodiment of the present invention.
- FIG. 11 is a diagram for explaining a storage unit according to Embodiment 1 of the present invention.
- FIG. 12 is a diagram for explaining a display unit according to Embodiment 1 of the present invention.
- FIG. 13 is a diagram for explaining the selection processing according to Embodiment 1 of the present invention.
- FIG. 14 is a diagram for explaining another example in the first embodiment of the present invention.
- FIG. 15 is a functional block diagram illustrating a functional configuration of the tracking target selection device according to the second embodiment of the present invention.
- FIG. 16 is a flowchart up to display processing showing the functional configuration of the tracking target selection device according to Embodiment 2 of the present invention.
- FIG. 17 is a diagram for explaining an example of display in the second embodiment of the present invention.
- FIG. 18 is a functional block diagram illustrating a functional configuration of the tracking target selection device according to the third embodiment of the present invention.
- FIG. 19 is a flowchart up to display processing showing the functional configuration of the tracking target selection device according to the third embodiment of the present invention.
- FIG. 20 is a diagram for explaining an example of display in the third embodiment of the present invention.
- FIG. 21 is a block diagram of the prior art.
- FIG. 22 is a block diagram of the tracking target selection device.
- FIG. 23 is a diagram showing a screen.
- FIG. 24 is a diagram illustrating a plurality of captured images.
- FIG. 25 shows a screen.
- FIG. 26 is a diagram showing screens at a plurality of times.
- FIG. 27 is a diagram showing a small-sized video or the like.
- FIG. 28 is a diagram illustrating an image in which the orientation of the subject is not front-facing.
- the tracking target selection device of the embodiment is a tracking target selection device (tracking target selection device 1, camera) that selects a tracking target (tracking target 103xm: FIG. 22), and includes an input image (for example, an image 9Ia (FIG. 22, FIG. 22)). 24), an object detection unit (object detection unit 301: FIG. 3, FIG. 22, etc.) for detecting a predetermined object (subject 301x) from the second input image captured by the camera, and the object detection unit.
- a tracking unit (tracking unit 302) that tracks the detected object (subject 301x) and calculates a tracking object candidate region (region 301xR of the previous video 93 included in the image 9Ia) where the object to be tracked is located; The previous position calculated by the tracking unit at a fixed position (position 92P: FIG. 22, FIG. 12, FIG. 13, etc.) in the input image (image 9Ib (FIG. 22, FIG. 24, etc.), first input image)
- a synthesizing unit (synthesizing unit 306) that synthesizes the images of the tracking object candidate region (region 301xR) (the previous video 93 (FIGS. 22, 24, etc.) and the video 92 (FIGS.
- the input image (image 9C (image 9C ()) including the synthesized image (video 92) is synthesized. 22, 24, 12, 13, and the like)), and the input image (image 9 ⁇ / b> C) that is displayed by the display unit after being synthesized.
- the user (user 1U: FIG. 22) operates (operation 104L2 (FIGS. 22 and 13) on the combined image (video 92 (video 93)) displayed at the fixed position (position 92P). Etc.), touch operations, etc.
- a selection unit that selects the subject B) as the tracking target (tracking target 103xm: FIG. 22) in tracking in a predetermined process (such as AF process).
- the tracking target selection device extracts predetermined features (coordinates 904 (FIG. 9) and the like) from the image (video 93) of the target object candidate region (region 301xR) tracked by the tracking unit.
- predetermined features coordinates 904 (FIG. 9) and the like
- a predetermined state angle 3D1b (FIG. 11), direction of the target object (video 93, subject 301x) 103x1, 103x2, etc.
- a determination unit for determining whether the calculated state is a predetermined state (for example, 0 degree (near), direction 103x2 (FIG. 12), etc.).
- the tracking target candidate area (the area 301xR, the image 93 of the area 301xR) (the one used for composition (area, image) )
- a storage unit (storage unit 305) for storage.
- storing an area means storing an image of the area.
- the display unit may control display by the display unit. That is, for example, a display control unit that performs this control may be configured by part or all of the display unit.
- the tracking target selection device is a camera (see FIG. 1 and the like), and an image pickup device (image pickup device 103: image pickup image 103: one image 91) of a subject (for example, subject B in FIG. 25).
- the display unit includes the captured one video (one video 91) and the other subject (subject B) of the one video (one video 91).
- the video (the other video 92: FIG. 25, etc.) is displayed, and the selection unit (selection unit 308) is operated (operation 92L: FIG. 25, etc.) on the other video (the other video 92) displayed.
- the subject (subject B) of the imaged one video (one video 91) is subject to tracking (tracking target 103xm: tracking target) in a predetermined process (such as AF processing). Select as Figure 22) It may be.
- a video (one video 91) of a subject (subject B) selected by an operation such as touch is a video taken by the camera (imaging device 103), and various videos (FIGS. 2, 26 to FIG. Appropriate operation is performed even though it is difficult to predict what kind of video it is (it is difficult to attach (often)).
- the other video 92 different from the one video 91 is displayed together with the one video 91 captured, and an operation on the displayed other video 92 is performed.
- the subject (subject B) is selected.
- the operation for the other image 92 is sufficient, and the selection operation can be performed. Easy to do.
- the synthesized image displayed at the second time (the lower time) of the plurality of times (the first time at the upper stage and the second time at the lower stage in FIG. 26).
- the position (position 921PN) of the other image 921N (921)) is the position (position 921PM) of the other image (the other image 921M (921)) at the first time (upper time).
- the common position 921P and not a different position (another position different from the position 921PM in the upper stage (the symbol is omitted)).
- the position 92P of the other video 92P is not moved (changed) during a plurality of times (first time, second time), and is stationary, that is, common.
- the position 92P (fixed position) is fixed.
- the position of the first video 911 moves from the position 911PM at the first time to the position 911PN at the second time ( The first time, the second time, and the like.
- positions 911PM (upper) and 911PN (lower) in FIG. 26, or positions 91a1P and 91a2P in FIG. It is not necessary to specify the position where the operation is to be performed from the positions. That is, just as the operation at the position 921PM is performed at the first time, the operation at the same position 921PN (the common position 921P) as the position 921PM at the first time is performed at the second time. You just need to be done. For this reason, it is not necessary for the position to be specified by the user from a plurality of positions, and the operation can be simplified more sufficiently.
- the position 911PN of one video 911N (911) at the second time (lower stage) is different from the position 911PM at the first time (upper stage), but the other at the second time (lower stage).
- the position 921PN of the video 921P is the same position (position 921P, the position of the lower right corner) as the position 921PM at the first time (upper stage), and is not another position. Therefore, although the position of one image 911 changes, the position 921PM of the other image 921 at the first time (upper stage) is the appropriate position (the position of the lower right corner).
- the position 921PN at time 2 (lower stage) is also maintained at an appropriate position (lower right corner position), and the other image 921 can be reliably displayed at an appropriate position (for example, lower right corner position).
- the operation can be simplified, the degree of simplicity can be sufficiently improved, and the position of the image to be combined (the other image 92 (921)) is surely appropriate. It is possible to achieve various effects.
- a tracking target selection device (tracking target selection device 1a) will be disclosed.
- FIG. 1 is a block diagram of the tracking target selection device.
- FIG. 3 is a functional block diagram of the target tracking selection device according to the first embodiment.
- a CPU Central Processing Unit
- ROM Read Only Memory
- FIG. 1 a CPU (Central Processing Unit) 101 is stored in a ROM (Read Only Memory) 102, and is an image processing program for processing according to flowcharts shown in FIGS.
- the computer program 1P is executed, and each element shown in FIG.
- RAM Random Access Memory
- the external storage device 106 in addition to the area for the storage unit 305 shown in FIG. 3, a primary storage area required for the processing by the CPU 101 is secured. ing.
- This apparatus includes an object detection unit 301, a tracking unit 302, a feature extraction unit 303, a determination unit 304, a storage unit 305, a synthesis unit 306, a display unit 307, and a selection unit 308.
- FIG. 4 is a flowchart of the tracking target candidate display method of the present invention.
- the tracking target candidate display method shown in FIG. 4 is realized by the apparatus shown in FIG.
- the target object is a person's face (see FIGS. 7 and 8, etc.)
- the predetermined state is the face orientation (columns (A) and (B) in FIG. 8). 2) (see video 91c in FIG. 2).
- step S401 in the present apparatus, the object detection unit 301 detects the position and size of the person's face from the image input from the image sensor 103, respectively.
- step S402 object tracking is performed using the human face detected by the object detection unit 301 as a unique object (tracking unit 302).
- step S403 a feature for calculating the face orientation of the tracked face is extracted (feature extraction unit 303).
- step S404 the orientation of the face is estimated from the features extracted in step S403 (for example, by the feature extraction unit 303).
- step S405 it is determined whether or not the error of the face direction estimated in step S404 is smaller than a past estimated error with respect to a predetermined state (for example, a face facing front) (determination unit 304). .
- step S406 If it is determined in step S406 that the estimated error is smaller than the past estimated error in step S405 (S405: Yes), the estimated error determined to be smaller is updated in the storage unit 305.
- step S407 similar to the processing in step S406, the face image detected in step S401 is stored in the storage unit 305 together with the unique label attached to the face image (corresponding to the face image). Update (see data 3D (FIG. 3)).
- step S408 the face image stored in step S407 is synthesized by the synthesis unit 306 so that the face image is displayed at a fixed position (see position 92P in FIG. 12, FIG. 13, etc.). Is displayed on the display unit 307.
- the object detection unit 301 detects a face candidate of the person that the user wants to track from the image (input image) input from the image sensor 103.
- AdaBoost As a detailed detection algorithm, for example, the AdaBoost algorithm disclosed in Japanese Patent Application Laid-Open No. 2006-350645 is used. (See the description below).
- the object detection method is not limited to this algorithm.
- non-patent document PRMU 107 (206) PP 211-224 also describes the possibility of detecting a general object. That is, the present technology is not limited to a target object that is a person's face, but can be applied in a range in which the target object is extended to a general object.
- the tracking unit 302 is suitable for a case where the object detection unit 301 is configured as, for example, one system LSI (Large Scale Integration) and can perform real-time processing (for example, processing 30 times or more per second). The following processing may be performed. That is, an object that has been detected once is likely to be detected in the vicinity of the position detected in the previous frame in the next frame. That is, in the above-described case where real-time processing is possible, the tracking unit 302 may use such continuity of detection positions to realize a tracking unit that uses this continuity.
- LSI Large Scale Integration
- FIG. 6 is a flowchart of the tracking unit.
- step S601 it is determined whether an object is present one frame before the position of the detected object.
- step S602 when it is determined in step S601 that the detected object exists in the vicinity of the position one frame before (S601: Yes), the detected object (the object existing in the vicinity one frame before and The history of detected coordinates of the same object is updated.
- step S603 when it is determined in step S601 that the detected object does not exist in the vicinity of one frame before (S601: No), the detected object is given to the previously detected object. A unique label (new label) is assigned, which is different from each printed label.
- step S604 the newly detected object detection coordinates are newly added to the history of detected coordinates.
- the tracking method using position continuity has been described.
- the method used may be an object tracking method using color or an object tracking method using face matching. Other methods may be used.
- FIG. 7 shows the result of tracking.
- FIG. 8 is a diagram for explaining the object orientation calculation processing by the feature extraction unit according to the first embodiment of the present invention.
- the left and right eye positions for example, the coordinates of the two eyes 801 and 802 in the column (A) in FIG. 8
- the face center coordinates for example, the coordinates 803
- the nose position Five points of coordinates for example, coordinate 804 and face size (Size in FIG. 8) are extracted.
- both eyes can be detected by an algorithm similar to the human face detection algorithm.
- the method used in this detection may be, for example, an edge-based method using a corner detection algorithm, or other methods.
- FIG. 9 is a diagram for explaining the processing for calculating the face center coordinates in the first embodiment of the present invention.
- the face detector (face detector 303a) is composed of a plurality of face orientation detectors (detectors 303a1 to 303a3, etc.), and a plurality of detections of various face orientations at one place where a face exists.
- Each of the detectors (detector 303a1, etc.) outputs a plurality of different candidate frames whose positions and sizes change (three (multiple) frames 901 by the detector 303a1) and three frames 902 by the detector 303a2. , See three frames 903 by the detector 303a3).
- the average value of the center coordinates of the plurality of candidate frames and the average value of the sizes are calculated, and the center coordinates obtained as a result of integrating the plurality of pieces of information obtained by the plurality of calculations are the face center coordinates.
- Coordinats 904 In other words, in this way, for example, any of the plurality of frames 901 by the detector 303a1, the plurality of frames 902 by the detector 303a2, and the plurality of frames 903 by the detector 303a3 is used.
- the center coordinate 904 of the face may be calculated with relatively high accuracy.
- FIG. 10 is a diagram for explaining the process of calculating the nose position coordinates (see coordinates 1004) in the first embodiment of the present invention.
- each of a plurality of face direction detectors (for example, detectors 303b1 to 303b3) outputs a plurality of candidate frames whose positions and sizes change ( (See FIG. 9).
- processing corresponding to the detector that has output the candidate frame is performed on the output candidate frame. That is, a certain amount of offset corresponding to the detector that has output the candidate frame is given to each candidate frame.
- the given offset is an offset normalized by the face size.
- the center coordinates of the candidate frame are corrected to the nose position (see corrected frames 1001 to 1003 in FIG. 10).
- the average value of the center coordinates and the average value of the sizes of the candidate frames (corrected frames 1001 to 1003) given the offset amount are calculated, and the center coordinates as a result of integrating the information obtained by each calculation are It is set as a nose position coordinate (coordinate 1004).
- the calculation is performed using the output result of each face detector.
- the detection may be performed by the same algorithm as the detection of the human face.
- the position may be detected using an algorithm.
- the description of the determination unit 304 will be described using an example of determining the face orientation of a person.
- the left and right eye position coordinates, face center coordinates, nose position coordinates, and face size have already been obtained by the feature extraction unit 303 before the following processing is performed.
- the X component (Face x and Nose x in Equation 1 below) of the face center position (for example, the coordinates 904 in FIG. 9) and the nose position (for example, the coordinates 1004 in FIG. 10)
- a normalized difference amount (F n , see the left side of Equation 1) normalized by the face size (FaceSize, see the denominator of Equation 1 below) is calculated.
- more face feature points may be acquired, that is, for example, an algorithm for calculating the face orientation more accurately from the geometric positional relationship may be used.
- the predetermined direction will be described as the front face direction (see the direction 103 ⁇ 2 in FIG. 12, etc.).
- the face is determined to be a front face, and the storage section 305 stores the front face Along with the determined image, the calculated value of the face orientation angle and the assigned label are also stored (see data 3D in FIG. 3).
- the face orientation calculation value of the same tracking target is compared with the stored calculation value (face orientation calculation value). If the calculated value is (direction calculated value), the following processing is performed.
- the processing includes the front-facing image and the face orientation angle calculation value stored in the storage unit, the above-described image in which the same tracking target face orientation calculation value is calculated, This is a process of updating each face orientation calculation value.
- FIG. 11 is a diagram for explaining a storage unit according to the first embodiment of the present invention.
- the storage unit stores images of the tracking object (face) in the front direction (images 3D1a to 3D3a) as shown in FIG.
- the calculated face orientation values values (scores) 3D1b to 3D3b) and assigned labels (labels 3D1c to 3D3c) are stored.
- the label (label 3D1c and the like) is, for example, information for specifying the subject (subject A) of the image (for example, the image 3D1a) to which the label is assigned from a plurality of subjects (subjects A to C). .
- the synthesizing unit 306 normalizes the size of the tracking object image stored in the storage unit, and synthesizes the tracking object image after the normalization with the input image. That is, for example, from the stored image, an image having a predetermined size and having the size normalized (changed) to the predetermined size is generated. May be combined with (a part of) the input image.
- the place to be combined is preferably a place that does not interfere with the shooting scene image, and may be combined at, for example, the four corners such as the lower and upper parts of the screen (the lower right corner in FIG. See places).
- the normalized image is not displayed, and only the original input image before the synthesis may be displayed. Then, only when the user's operation instructs to display the normalized image, the image (normalized image) is synthesized in an arbitrary place (such as the location in the lower right corner), A combined image that includes both the original input image and the normalized image may be generated and displayed. That is, the image (normalized image) may be displayed only when this instruction is given.
- a label shown in the vicinity of the video 91 in FIG. 12 assigned to the vicinity of the position of the object being tracked (for example, the position of the video 91 in FIG. 12) is given. (See the letter “B”).
- the target object to be selected a normalized and synthesized image of the subject (see video 92)
- the actual position on the input image the subject is captured in the input image. (Relatively easily), the user can easily understand it.
- FIG. 12 is a diagram for explaining the display unit in the first embodiment of the present invention.
- the display unit 307 causes the combining unit 306 to display the input image and the image stored in the storage unit.
- FIG. 12 shows an example in which the stored image and the input image are synthesized and an image 9C generated by the synthesis is displayed.
- FIG. 5 is a flowchart of the tracking target selection process showing the functional configuration of the tracking target selection device according to the first embodiment of the present invention.
- step S501 the tracking target candidate image stored in the storage unit is displayed at a fixed position.
- the display is performed at the first time (for example, the time in the upper part of FIG. 26) (S501a), and the same position (the position 921PN) as the position (for example, the position 921PM) displayed at the first time. ) May be displayed at the second time and displayed at a fixed position.
- step S502 the user selects a target to be tracked from a fixed position by the user's operation (such as a touch on the above-described image (video 92 in FIG. 12) that has been synthesized and displayed) (the user selects the device 1).
- a target for example, the subject B in FIG. 12
- an operation such as touch
- the synthesized video for example, the video 92b in FIG. 12
- FIG. 13 is a diagram for explaining the selection process according to the first embodiment of the present invention.
- the user has a plurality of face images (three subjects A to C) each of which is a face image (video 92) facing the front at a fixed position (position 92P or the like). It is possible to touch the face image of the target to be tracked from the (face image) and select the target (subject, for example, subject B) of the face image, so that erroneous target selection is not performed.
- the image stored in the storage unit may be displayed on the screen 104R without combining the images 92 without combining the images.
- FIG. 14 is a diagram for explaining another example in the first embodiment of the present invention.
- the face of a person has been described as an example.
- a form of a general object other than a person, such as a car, may be configured.
- the feature extraction unit may extract the edge and frequency components of the face image used for smile determination as features. Then, the determination unit may determine the smile level from the extracted features, store it in the storage unit, and output the scene as a still picture. In other words, for example, a face image with a relatively high smile level specified from the features extracted from the face image is output, and a scene image of the output face image is synthesized. May be output as a video (see video 92).
- FIG. 15 is a functional block diagram of the tracking target selection device (device 1b) using character information in the second embodiment.
- This apparatus includes an object detection unit 1501, a tracking unit 1502, a feature extraction unit 1503, a character recognition unit 1504, a storage unit 1505, a synthesis unit 1506, a display unit 1507, and a selection unit 1508.
- FIG. 16 is a flowchart up to display processing showing the functional configuration (processing configuration) of the tracking target selection device according to Embodiment 2 of the present invention.
- step S1604 features necessary for character recognition are extracted from the tracked target object candidate images (feature extraction unit 1503).
- step S1605 it is determined whether the target object candidate being tracked has already been recognized.
- step S1606 if character recognition has not been performed yet, character recognition is performed based on the extracted features (character recognition unit 1504).
- step S1607 it is determined whether character recognition is successful.
- step S1608 If character recognition fails in step S1608 (S1607: No), the image of the tracking object is stored.
- step S1609 if character recognition is successful (S1607: Yes), the recognized character is stored (storage unit 1505).
- step S1610 the tracking target candidate images and characters stored in the storage unit are combined with the input image and displayed at a fixed position (fixed position).
- the character recognition unit 1504 recognizes unique character information that the tracking target object has.
- the recognized character information is character information such as information on a car license plate.
- the storage unit 1505 stores both the tracking target image and the recognized character information (see data 3D in FIG. 3).
- the tracking target candidate image and / or the character information are combined with the input image (see FIG. 17 described later), and the combined image (input image) is displayed. Are displayed on the display unit 1507.
- FIG. 17 is a diagram for explaining an example of display in the second embodiment of the present invention.
- a display as shown in FIG. 17 may be displayed.
- FIG. 18 is a functional block diagram of the target tracking selection device (device 1c) using character information according to the third embodiment.
- This apparatus includes an object detection unit 1801, a tracking unit 1802, a feature extraction unit 1803, a similarity calculation unit 1804, a storage unit 1805, a synthesis unit 1806, a display unit 1807, a selection unit 1808, and a registration DB 1809.
- DB database
- FIG. 19 is a flowchart up to display processing showing the functional configuration (processing configuration) of the tracking target selection device according to Embodiment 3 of the present invention.
- portions after the determination unit (portions after S1905) will be described in detail.
- step S1905 the feature (feature 1803a: FIG. 18) extracted from the tracked target object candidate image is matched with the feature (feature 1809a) registered in the registration DB in advance (of those The similarity (between features) (similarity 1804a: FIG. 18) is calculated.
- step S1906 the similarity (similarity 1804a) calculated in S1905 is compared with the past similarity (see data 3DW (see FIG. 18)) for determination.
- step S1907 when the similarity calculated in step S1905 is higher than the past similarity (similarity of data 3DW) (S1906: Yes), the similarity is updated.
- step S1908 when the value is higher than the past similarity (S1906: Yes), the tracking object is stored and updated.
- step S1909 it is determined whether the similarity calculated in step S1905 is higher than a certain threshold.
- step S1910 if the degree of similarity is higher than a certain threshold (S1909: Yes), additional information associated with the registration DB is also stored in the storage unit.
- step S1911 the tracking target candidate image and the additional information stored in the storage unit are respectively combined with the input image (see video 92e in FIG. 17), and displayed at a fixed position in the input image. .
- the registration DB 1809 is a database in which a face image of a specific person and additional information (person name, etc.) are registered in advance.
- the similarity calculation unit 1804 performs matching between the feature extracted by the feature extraction unit 1803 (feature 1803a: FIG. 18) and the feature registered in the registration DB 1809 (feature 1809a). If the similarity (similarity 1804a) of the matching result is higher than the previous similarity (similarity of data 3DW) (S1906: Yes in FIG. 19), the similarity and the tracking object image are determined. Update storage to the storage unit. Furthermore, when the similarity exceeds the threshold (S1909: Yes), additional information associated with the registration DB is also stored in the storage unit.
- the synthesizing unit 1806 when there is additional information together with the tracking target image, the additional information is also synthesized with the input image (described above) and displayed on the display unit.
- FIG. 20 is a diagram for explaining an example of display in the third embodiment of the present invention.
- a display example is shown in FIG. 1
- the target object selecting device is a computer system including a central processing unit (CPU), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.
- a computer program is stored in the RAM.
- Each device achieves its function by the CPU operating in accordance with the computer program.
- the computer program is configured by combining a plurality of instruction codes indicating instructions for the computer in order to achieve a predetermined function.
- a system LSI is an ultra-multifunctional LSI that is manufactured by integrating a plurality of components on a single chip.
- the system LSI is a computer system that includes a microprocessor, ROM, RAM, and the like. is there. A computer program is stored in the RAM. The system LSI achieves its functions by the microprocessor operating according to the computer program.
- IC card Integrated Circuit
- the IC card or the module is a computer system including a microprocessor, a ROM, a RAM, and the like.
- the IC card or the module may include the super multifunctional LSI described above.
- the IC card or the module achieves its function by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.
- the present invention may be the method described above. Further, the present invention may be a computer program that realizes these methods by a computer, or may be a digital signal composed of the computer program.
- the present invention also provides a computer-readable recording medium such as a flexible disk, a hard disk, a CD-ROM (Compact Disc-ROM), an MO (Magneto-Optical disc (disc)), It may be recorded on a DVD (Digital Versatile Disc), DVD-ROM, DVD-RAM, BD (Blu-ray Disc), semiconductor memory, or the like.
- the digital signal may be recorded on these recording media.
- the present invention may be a method of transmitting registration data, the computer program, or the digital signal via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like.
- the present invention may be a computer system including a microprocessor and a memory, the memory storing the computer program, and the microprocessor operating according to the computer program.
- program or the digital signal may be recorded on the recording medium and transferred, or the program or the digital signal may be transferred via the network or the like to be transferred by another independent computer system.
- the invention may be implemented.
- the tracking target when the user selects the tracking target, the tracking target is moved even if the tracking target is moving (see the (A) column in FIG. 2) or the visibility is poor (see the (B) column).
- an object detection unit that detects a predetermined object from the input image, and a tracking unit that performs tracking by identifying (identifying) the detected object as the same object (even if the times are different from each other).
- a determination unit that determines whether or not, a score representing the state of the object determined to be stored by the determination unit, and an image of the object being tracked (an image in the state (face direction) in the score)
- the image of the object (image in the state of the above-mentioned score) stored in the storage unit is synthesized at a fixed position with the storage unit that stores the image and the input image
- a display unit for displaying the input image described above,
- the serial display unit, a detection object displayed in a fixed position, predetermined processing consists selecting section for selecting as a tracking target in the tracking in.
- the subject may be a pet such as a cat or dog.
- the image (for example, the other image 92 in FIG. 12) synthesized and displayed at the fixed position (position 92P in FIG. 12) described above is the texture of the pet's fur (color, pattern, fur texture). Etc.) may be displayed.
- a car (subject 103x) running on the road 103R may be monitored.
- the subject eg, subject B in FIG. 26
- the first image one image 91
- a second video the other video 92
- the direction of the subject (subject B) in the second video (the other video 92b) of one subject eg, the subject B in FIG. 12
- the (direction 92bd) may be the same as the direction (direction 92md) of the subject (subject C) in the second video (the other video 92m) of another subject (for example, the subject C).
- the object displayed at the fixed position does not necessarily have to face the same direction (orientation).
- the direction 92bd in the other image 92b of the subject B in FIG. 12 is the same direction as the direction 92md in the other image 92m of the subject C.
- the object displayed at the fixed position may face the same direction.
- the determination unit calculates a score (angle 3D1b (FIG. 11) or the like) indicating the state (the direction of the photographed subject, etc.), and the storage unit stores the image ( You may memorize
- the determination unit compares the calculated score (angle 3Dxb (FIG. 11)) indicating the state with a score (angle 3D1b) stored in advance in the storage unit, and calculates the score ( The angle (3Dxb) and the image (image 3Dxa) in the state indicated by the calculated score (angle 3Dxb) are determined to be updated in the storage unit, and the determination unit determines to update.
- the score (angle 3D1b) stored in advance in the storage unit and the image (image 3D1a) stored in association with the score (angle 3D1b) stored in advance are calculated as the score ( (Angle 3Dxb) and the image (image 3Dxa) of the calculated score (angle 3Dxb) may be updated respectively.
- the feature extraction unit is a feature amount (coordinates 904) indicating a direction (one of the direction 103x1, the direction 103x2, and the like in FIG. 12) that appears in the image (video 93: FIG. 22 and the like) of the tracking object candidate region. (FIG. 9 etc.) is extracted, and based on the extracted feature quantity, the determination unit determines that the orientation (one of the orientation 103x1, orientation 103x2, etc.) indicated by the feature quantity is a predetermined orientation. (For example, the direction of the direction 103d (FIG. 12)), and when it is determined as the predetermined direction, the tracking object from which the feature amount indicating the determined direction is extracted.
- Candidate regions image 3Dxa (FIG. 11), region 301xR in FIG. 22 where image 3Dxa was present, etc.
- image 3Dxa (FIG. 11), region 301xR in FIG. 22 where image 3Dxa was present, etc.) may be stored in the storage unit.
- storing an area means storing an image of the area.
- the tracking object candidate region (region 301xR: FIG. 22) is a region of a person's face (face 8F: FIG. 8), and the feature extraction unit uses face center coordinates (for example, coordinates 807: FIG. 8), nose position coordinates (coordinates 808), eye position coordinates (coordinates 805 and 806), and face size (Size) are extracted, and the determination unit determines the face center coordinates (coordinates 807) and the nose position coordinates.
- Two coordinates of the difference between the two coordinates (coordinate 808), the face center coordinates (coordinate 807), and the coordinates of the center of the two eye position coordinates (coordinates 805, 806) (coordinate 805a) It may be determined (previously described) whether or not it is the predetermined direction (the direction of the direction 103D in FIG. 12) from the two differences (without a sign) between the difference between the two.
- the feature extraction unit extracts a feature amount indicating a facial expression of a person in the tracking object candidate region (region 301xR: FIG. 22), and the determination unit determines the person's facial expression based on the extracted feature amount. It may be determined whether or not the facial expression is a smiling expression.
- the determination as to whether or not the expression is a smile may be made, for example, by a process using a known technique.
- the feature extraction unit extracts a feature amount (for example, a position and a direction of a character edge) necessary for character recognition from an object in the tracking object candidate region (region 301xR), and the determination unit extracts Based on the feature amount, it may be determined whether or not the character appearing on the object has been recognized.
- a feature amount for example, a position and a direction of a character edge
- the feature extraction unit extracts a feature amount necessary for object recognition from an object in the tracking object candidate region (region 301xR), and the determination unit extracts a feature registered in the storage unit in advance.
- the matching may be performed with the feature indicated by the feature amount.
- object recognition refers to identifying the same object as the object in the tracking object candidate area from among a plurality of objects.
- the synthesis unit In addition to the image of the tracking object candidate region (region 301xR) (such as the video 92 in FIG. 20), the synthesis unit also includes additional information (an image 92N of a label (name, etc.)) obtained by the determination unit (You may synthesize
- the display of the other video 92 at the upper time of FIG. 26 may be performed at S501a in FIG. 5, and the display at the lower time may be performed at S501b.
- the selection data 308d (FIG. 22) for specifying the subject selected as the tracking target 103xm (FIG. 22 and the like) is generated, so that the subject specified by the generated data 208d is the tracking target 103xm. It may be selected.
- the size (size 92S) of the subject (subject B) in the other video to be displayed is equal to or larger than a predetermined threshold (threshold Th).
- the size may not be smaller than the threshold (threshold Th) (the size of the video 91b in FIG. 2, the small size 912S in FIG. 27, etc.).
- a plurality of subjects are used as the subject of the other video 92 that is seen (for example, subject B in FIG. 25).
- the same subject (subject B) as the subject is easily identified.
- the user can easily determine that it is appropriate to perform the operation 92L with respect to the other video 92, and the operation can be further simplified.
- the threshold Th is, for example, when the size of the photographed subject is equal to or smaller than that size, the image of the subject at that size (the image 91b in FIG. 2 and the image 912 in FIG. 27). ) May be a size that is not easy and difficult to perform (for example, the largest of a plurality of such sizes).
- the direction of the subject (direction 92d in FIG. 28) in the other image to be displayed (the other image 92 in FIG. 25) is the same direction (direction) as the predetermined direction (direction 103d in FIG. 12).
- 103x2 which is the direction toward the image sensor 103 (camera 1)
- different directions direction 103x1, direction 913d in FIG. 28, direction in the image 91c in FIG. 2 (backward direction, left rearward direction, etc.)
- the direction of the subject refers to the direction in which the surface on which many features of the subject appear, such as the front side of the subject, (the direction 92d in FIG. 28, the direction 92dd in FIG. 14). Etc.).
- the predetermined direction described above is, for example, the same direction as the direction 103d (FIG. 12) facing the image sensor 103 (close to the direction 103d, in the vicinity of the direction 103d).
- one video 911 in FIG. 26 moves, for example, from a position 911PM (upper stage) to a position 911PN (lower stage), and the size of the photographed subject is a threshold Th ( 25) (see the small size 912S in FIG. 27), and the direction of the photographed subject (subject B) is different from a predetermined direction (direction 103d in FIG. 12).
- the image may be a direction (direction 103 ⁇ 1, refer to the left rearward direction in one image 911 in FIG. 26).
- the position does not move (see position 92P in FIGS. 25 and 26), has a large size 92S (see FIG. 25), and is determined in advance.
- the other image 921 (FIG. 26) having the same direction (direction 103 ⁇ 2 in FIG. 12) as the other direction may be displayed.
- the image pickup device captures the first video (one video 91 in FIG. 28), the second video (one video (second video) 91 in FIG. 24), The same subject (subject B) as the subject (subject B in FIG. 28) of the video is copied, and the direction of the captured subject (direction in the video 93x of FIG. 24) is the predetermined direction (FIG.
- the previous video (the previous video 93x in FIG. 24) in the same direction (direction 103x2) as the 12 directions 103d) is captured, and the display unit is information on the captured previous video (the previous video 93x).
- the information 3D in FIG. 24 (FIG. 3)) (by the combining unit 306 (FIG. 24, FIG. 3, etc.)) generated in the same direction as the predetermined direction (direction 103 ⁇ 2, FIG. 12)
- the other image (direction 92d in FIG. 28) (FIG. 28). May be displayed other image 92).
- the information for displaying the other video 92 (information 3D: FIG. 24, FIG. 3, etc.) is simply used simply by capturing the previous video 93x (FIG. 24). This makes it possible to easily display the other video 92.
- a plurality of destinations including an appropriate destination image 93 (the preceding image 93x: FIG. 24) having the direction 103x2 and an inappropriate destination image 93 having the direction 103x1. From the video 93 (see FIG. 24), an appropriate previous video 93 (the previous video 93x) may be selected and used.
- the display unit displays the subject (car C) other than the one video (one video 91e) and the other video (the other video 92e) of the subject (for example, the car C in FIG. 17).
- An image (the other image 92e) indicating “Nara 330xx-oo” and the character string 92e2) may be displayed as the other image (the other image 92).
- the synthesizing unit performs character recognition from the other video (video 93 in FIG. 17) other than the one video (one video 91e) and the other video (the other video 92e).
- the identified character (“Nara 330xx-oo”, character string 92e2) generates a video (the other video 92e) synthesized with the other video (video 93), and the display unit generates
- the obtained video (the other video 92e) may be displayed as the other video (the other video 92).
- information characters such as the name of the subject may be displayed.
- a character recognition unit 1504 (FIG. 15 or the like) that performs the above-described character recognition may be provided.
- the character recognition unit 1504 may be a part of the determination unit 304 or may be provided outside the determination unit 304.
- the video 9W in FIG. 23 may be captured as one video 91 (previously described). Then, this video 9W does not move (for a predetermined time), has a large size 92S, and the direction of the imaged subject, as with the other video 92 shown in FIG. 25, for example. May be an image in the same direction as the predetermined direction (direction 103 ⁇ 2 in FIG. 12).
- the imaged one video 91 is such a video 9W or other video that is not the video 9W (one video 911 in FIG. 26, one video 912 in FIG. 27, one video in FIG. 28). Regardless of whether the video 91 is captured and displayed, the video 91 (either video 9W or video 912 or the like) The other video 92 may be displayed.
- this camera is a consumer digital camera, for example, and when it is desired by the purchased user, the desired subject is imaged, and what kind of subject image is captured cannot be predicted (attached). It is difficult) Cameras.
- this camera may be a camera (such as a surveillance camera) that captures the subject 103x that appears by chance, such as a car running on the road 103R.
- a camera such as a surveillance camera
- a display (thick line) 91X indicating that the operation 104L2 has been performed is indicated by the position 91P of one video 91 and the other Among the positions 92P of the video 92, it is displayed only at the position 91P (in the vicinity) of one video 91 and may not be displayed at the position (in the vicinity) of the other video 92.
- the other video 92 displayed at the fixed position does not include an image of the appearance of the subject (see the image 92e1 in FIG. 17), and is a character string 9X1 specified by character recognition.
- a video 9X representing only the character string 92e2 (see FIG. 17, character string 92e2 of the video 92e) may be displayed.
- the subject (car B) is identified from the plurality of subjects (car A to car C), and the subject (car B) is identified along with one of the captured images 91e (the character string “of the subject”
- the other image 9X (showing “Osaka 550 nao-xx”) is displayed for easy operation.
- the display at the fixed position can sufficiently simplify the operation, and the displayed position can be surely made appropriate.
- the simple display using only the character string 9X1 is performed, an easy-to-understand display is provided and a more adequate display can be performed.
- the position (position 911PM, position 911PN) of one image 91 (one image 911) of subject B is a predetermined first time (upper stage).
- the position relationship 921JM (the relationship on the right side) may be included with respect to another position (the position of one captured image 91 (position 921XM) of another subject A).
- the second positional relationship 921JN (the relationship on the left side) (the image of the other subject (subject A) at the second time is displayed).
- Position (relative to position 921XN).
- the position (position 921P) of the other image 92 (the other image 921) of the subject B is the same as the other position 922J (the relationship on the right side) at the first time and the second time.
- the position (position 922X (922XM, 922XN)) of the other synthesized image 92 of the subject (subject A (other subject)) may be included.
- the positional relationship 922J (the relationship on the right side) is different. Operation at the position of the positional relationship (not shown, for example, the relationship on the left side) is unnecessary (see the positional relationship 922J in the lower stage), and the operation with the same positional relationship 922J (on the right side) can be performed more reliably. Easy operation.
- a part (or all) of the tracking target selection device (camera) 1 may be a computer 1C (FIG. 1) including the CPU 101 (FIG. 1). Then, the computer 1C may execute the computer program 1P (FIG. 1, for example, the above-described image processing program), thereby realizing one or more functions described above.
- the computer program 1P may be stored in the ROM 102, for example.
- an integrated circuit 1L (FIG. 1) on which one or more functions described above are mounted may be constructed by configuring the computer 1C and configuring an appropriate circuit.
- reference numeral 705 indicates an image at the time T + ⁇ of the same object as the object of the label A at the time T.
- Reference numeral 706 indicates an image at the time T + ⁇ of the same object as the object of the label B at the time T.
- Reference numeral 707 indicates an image of the newly detected object with the label C attached.
- Reference numeral 901 indicates a face detection candidate frame output by the right 75 degree face detector.
- Reference numeral 902 indicates a face detection candidate frame output by the right 30 degree face detector.
- Reference numeral 903 indicates a face detection candidate frame output by the front face detector.
- Reference numeral 1001 indicates a result of adding an offset to the face detection candidate frame output by the right 75 degree face detector.
- Reference numeral 1002 indicates the result of adding an offset to the face detection candidate frame output by the right 30 degree face detector.
- Reference numeral 1003 indicates a result obtained by adding an offset to the face detection candidate frame output by the front face detector.
- Reference numeral 104L2 indicates an operation of selecting an object at a fixed position.
- the tracking target selection apparatus and method and storage medium according to the present invention can easily select and track a subject in shooting various scenes when shooting with a digital camera or digital video camera, and perform AF / AE. By controlling, it is possible to easily perform shooting without failure, which is useful.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Signal Processing (AREA)
- General Health & Medical Sciences (AREA)
- Geometry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Studio Devices (AREA)
Abstract
Description
本実施の形態1では、追尾対象選択装置(追尾対象選択装置1a)について開示する。
図15は、本実施の形態2における、文字情報を用いた追尾対象選択装置(装置1b)の機能ブロック図である。
図18は、本実施の形態3における、文字情報を用いた対象追尾選択装置(装置1c)の機能ブロック図である。
102 ROM
103 カメラ
104 表示ディスプレイ
105 RAM
106 外部記憶装置
108 インターフェース装置
301 物体検出部
302 追尾部
303 特徴抽出部
304 判定部
305 記憶部
306 合成部
307 表示部
308 選択部
701 時刻Tにおける画像フレーム
702 ラベルAが付与された映像
703 ラベルBが付与された映像
704 時刻T+αにおける画像フレーム
801 正面向きの右目
802 正面向きの左目
803 正面向きの顔中心座標
804 正面向きの鼻座標
805 右向きの右目
806 右向きの左目
807 右向きの顔中心座標
808 右向きの鼻座標
904 顔中心座標
1004 鼻位置座標
1200 表示画面
1201 画像合成した画像
1302 選択された物体
9X1 認識された文字
1809 登録(DB)データベース
Claims (19)
- 追尾対象を選択する追尾対象選択装置であって、
入力画像から、所定の物体を検出する物体検出部と、
前記物体検出部により検出した前記物体を追尾し、追尾される当該物体がある追尾物体候補領域を算出する追尾部と、
入力画像における固定の位置に、前記追尾部で算出された前記追尾物体候補領域の画像を合成する合成部と、
前記合成部で、前記画像が前記固定の位置に合成された後における、合成された当該画像が含まれる前記入力画像を表示する表示部と、
前記表示部により表示される、合成がされた後の前記入力画像における、前記固定の位置に表示されている、合成がされた前記画像に対して、ユーザにより操作がされた場合に、前記操作がされた前記画像において検出された前記物体を、予め定められた処理における追尾での前記追尾対象として選択する選択部とを備える追尾対象選択装置。 - 前記追尾部で追尾している前記対象物体候補領域の前記画像から、所定の特徴を抽出する特徴抽出部と、
前記特徴抽出部から抽出された前記特徴から、対象物体の所定の状態を算出し、算出される前記状態が、予め定められた状態か否かを判定する判定部と、
前記判定部によって、算出された、前記対象物体候補領域の前記状態が、前記予め定められた状態であると判定された場合に、判定がされた前記追尾対象候補領域を記憶する記憶部とを備える請求項1記載の追尾対象選択装置。 - 前記判定部は、前記状態を示すスコアを算出し、
前記記憶部は、前記追尾物体候補領域の前記画像とともに、算出された、当該画像の前記状態の前記スコアを記憶する請求項2記載の追尾対象選択装置。 - 前記判定部は、算出された、前記状態を示すスコアを、前記記憶部に予め記憶されたスコアと比較して、算出された前記スコアと、算出された前記スコアにより示される前記状態の前記画像とを前記記憶部に更新するか否かを判定し、
前記判定部により、更新すると判定された場合、前記記憶部に予め記憶された前記スコアと、予め記憶された当該スコアに対応付けて記憶された画像とを、算出された前記スコアと、算出された当該スコアの前記画像とへと、それぞれ更新する請求項2記載の追尾対象選択装置。 - 前記特徴抽出部は、前記追尾物体候補領域の前記画像に表れる向きを示す特徴量を抽出し、
前記判定部は、抽出した前記特徴量に基づいて、当該特徴量により示される前記向きが、所定の向きであるか否かを判定し、
前記所定の向きと判定された場合に、判定がされた前記向きを示す前記特徴量が抽出された前記追尾物体候補領域を前記記憶部に記憶させる請求項2記載の追尾対象選択装置。 - 前記追尾物体候補領域は、人物の顔の領域であり、
前記特徴抽出部は、前記特徴量として、顔中心座標、鼻位置座標、目位置座標、顔サイズを抽出し、
前記判定部は、前記顔中心座標と、前記鼻位置座標との2つの座標の間の差と、前記顔中心座標と、2つの前記目位置座標の中心の座標との2つの座標の間の差との2つの前記差から、前記所定の向きか否かの判定をする請求項5記載の追尾対象選択装置。 - 前記特徴抽出部は、前記追尾物体候補領域の人物の顔表情を示す特徴量を抽出し、
前記判定部は、抽出した当該特徴量に基づいて、前記人物の前記顔表情が、笑顔の表情であるか否かを判定する請求項2記載の追尾対象選択装置。 - 前記特徴抽出部は、前記追尾物体候補領域の物体から、文字認識に必要な特徴量を抽出し、
前記判定部は、抽出した当該特徴量に基づいて、当該物体に表れた文字の文字認識ができたか否かを判定する請求項2記載の追尾対象選択装置。 - 前記特徴抽出部は、前記追尾物体候補領域の物体から、物体認識に必要な特徴量を抽出し、
前記判定部は、前記記憶部に予め登録されている特徴と、抽出した当該特徴量により示される特徴との間のマッチングを行い、判定をする請求項2記載の追尾対象選択装置。 - 前記合成部は、前記追尾物体候補領域の画像に加えて、前記判定部により得られる付加情報も合成する請求項2記載の追尾対象選択装置。
- 当該追尾対象選択装置は、カメラであり、
被写体の、一方の映像を撮像する撮像素子を備え、
前記表示部は、撮像された前記一方の映像と共に、当該一方の映像の被写体の他方の映像を表示し、
前記選択部は、表示された前記他方の映像に対する操作がされた場合に、撮像された前記一方の映像の前記被写体を、予め定められた処理における追尾での、追尾の対象として選択し、
複数の時刻のうちの第2の時刻での、表示される前記他方の映像の位置は、第1の時刻での当該他方の映像の位置と同じ位置である請求項1記載の追尾対象選択装置。 - 表示される前記他方の映像における、前記被写体のサイズは、予め定められた閾値以上のサイズであり、当該閾値よりも小さいサイズではない請求項11記載の追尾対象選択装置。
- 表示される前記他方の映像における、前記被写体の方向は、予め定められた方向と同じ方向であり、異なる方向ではない請求項11または12記載の追尾対象選択装置。
- 前記撮像素子は、前記一方の映像である、後の映像を撮像するよりも前に、当該後の映像の前記被写体と同じ被写体が写され、写された当該被写体の方向が、前記予め定められた方向と同じ方向である先の映像を撮像し、
前記表示部は、撮像された前記先の映像の情報を利用することにより生成された、前記予め定められた方向と同じ方向の前記他方の映像を表示する請求項13記載の追尾対象選択装置。 - 前記表示部は、前記被写体の前記一方の映像および前記他方の映像以外の、当該被写体のその他の映像から生成された、複数の被写体から当該被写体を特定する文字を示す映像を、前記他方の映像として表示する請求項11~14の何れかに記載の追尾対象選択装置。
- 前記合成部は、前記一方の映像および前記他方の映像以外の前記その他の映像から文字認識により特定された前記文字が、当該その他の映像に対して合成された映像を生成し、
前記表示部は、生成された当該映像を、前記他方の映像として表示する請求項15記載の追尾対象選択装置。 - 追尾対象を選択する追尾対象選択装置に設けられる集積回路であって、
入力画像から、所定の物体を検出する物体検出部と、
前記物体検出部により検出した前記物体を追尾し、追尾される当該物体がある追尾物体候補領域を算出する追尾部と、
入力画像における固定の位置に、前記追尾部で算出された前記追尾物体候補領域の画像を合成する合成部と、
前記合成部で、前記画像が前記固定の位置に合成された後における、合成された当該画像が含まれる前記入力画像を表示部に表示させる表示制御部と、
前記表示部により表示される、合成がされた後の前記入力画像における、前記固定の位置に表示されている、合成がされた前記画像に対して、ユーザにより操作がされた場合に、前記操作がされた前記画像において検出された前記物体を、予め定められた処理における追尾での前記追尾対象として選択する選択部とを備える集積回路。 - 追尾対象を選択する追尾対象選択方法であって、
入力画像から、所定の物体を検出する物体検出ステップと、
前記物体検出ステップで検出した前記物体を追尾し、追尾される当該物体がある追尾物体候補領域を算出する追尾ステップと、
入力画像における固定の位置に、前記追尾部で算出された前記追尾物体候補領域の画像を合成する合成ステップと、
前記合成ステップで、前記画像が前記固定の位置に合成された後における、合成された当該画像が含まれる前記入力画像を表示する表示ステップと、
前記表示ステップで表示される、合成がされた後の前記入力画像における、前記固定の位置に表示されている、合成がされた前記画像に対して、ユーザにより操作がされた場合に、前記操作がされた前記画像において検出された前記物体を、予め定められた処理における追尾での前記追尾対象として選択する選択ステップとを含む追尾対象選択方法。 - コンピュータに、追尾対象を選択させるためのコンピュータプログラムであって、
入力画像から、所定の物体を検出する物体検出ステップと、
前記物体検出ステップで検出した前記物体を追尾し、追尾される当該物体がある追尾物体候補領域を算出する追尾ステップと、
入力画像における固定の位置に、前記追尾部で算出された前記追尾物体候補領域の画像を合成する合成ステップと、
前記合成ステップで、前記画像が前記固定の位置に合成された後における、合成された当該画像が含まれる前記入力画像を表示部に表示させる表示制御ステップと、
前記表示部により表示される、合成がされた後の前記入力画像における、前記固定の位置に表示されている、合成がされた前記画像に対して、ユーザにより操作がされた場合に、前記操作がされた前記画像において検出された前記物体を、予め定められた処理における追尾での前記追尾対象として選択する選択ステップとを前記コンピュータに実行させるためのコンピュータプログラム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010800034826A CN102239687B (zh) | 2009-10-07 | 2010-10-05 | 追踪对象选择装置、方法及其电路 |
US13/133,223 US8432357B2 (en) | 2009-10-07 | 2010-10-05 | Tracking object selection apparatus, method, program and circuit |
EP10821737.3A EP2355492B1 (en) | 2009-10-07 | 2010-10-05 | Device, method, program, and circuit for selecting subject to be tracked |
JP2011535280A JP5399502B2 (ja) | 2009-10-07 | 2010-10-05 | 追尾対象選択装置、方法、プログラム及び回路 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009-233289 | 2009-10-07 | ||
JP2009233289 | 2009-10-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011043060A1 true WO2011043060A1 (ja) | 2011-04-14 |
Family
ID=43856544
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/005956 WO2011043060A1 (ja) | 2009-10-07 | 2010-10-05 | 追尾対象選択装置、方法、プログラム及び回路 |
Country Status (5)
Country | Link |
---|---|
US (1) | US8432357B2 (ja) |
EP (1) | EP2355492B1 (ja) |
JP (1) | JP5399502B2 (ja) |
CN (1) | CN102239687B (ja) |
WO (1) | WO2011043060A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017216709A (ja) * | 2017-07-13 | 2017-12-07 | 株式会社ニコン | 電子カメラ |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4636064B2 (ja) * | 2007-09-18 | 2011-02-23 | ソニー株式会社 | 画像処理装置および画像処理方法、並びにプログラム |
EP2333718B1 (en) * | 2009-01-29 | 2013-08-28 | Nec Corporation | Feature amount selecting device |
US9633437B2 (en) * | 2010-03-22 | 2017-04-25 | Omnitek Partners Llc | Method for displaying successive image frames on a display to stabilize the display of a selected feature in the image frames |
KR101381439B1 (ko) * | 2011-09-15 | 2014-04-04 | 가부시끼가이샤 도시바 | 얼굴 인식 장치 및 얼굴 인식 방법 |
US20130155308A1 (en) * | 2011-12-20 | 2013-06-20 | Qualcomm Incorporated | Method and apparatus to enhance details in an image |
CN102547209B (zh) * | 2012-02-06 | 2015-07-22 | 华为技术有限公司 | 视讯设备控制方法、装置及视讯系统 |
US9124800B2 (en) * | 2012-02-13 | 2015-09-01 | Htc Corporation | Auto burst image capture method applied to a mobile device, method for tracking an object applied to a mobile device, and related mobile device |
KR101231469B1 (ko) | 2012-02-23 | 2013-02-07 | 인텔 코오퍼레이션 | 이미지 처리 지원 방법, 장치, 및 이 방법을 실행하기 위한 컴퓨터 판독 가능한 기록 매체 |
JP5941736B2 (ja) * | 2012-04-10 | 2016-06-29 | オリンパス株式会社 | 撮影機器 |
JP5970937B2 (ja) | 2012-04-25 | 2016-08-17 | ソニー株式会社 | 表示制御装置および表示制御方法 |
US9426408B2 (en) | 2012-08-27 | 2016-08-23 | Nokia Technologies Oy | Method and apparatus for recording video sequences |
US8933885B2 (en) * | 2012-09-25 | 2015-01-13 | Nokia Corporation | Method, apparatus, and computer program product for reducing hand or pointing device occlusions of a display |
US20140192205A1 (en) * | 2013-01-08 | 2014-07-10 | Samsung Electronics Co. Ltd. | Apparatus and method for object tracking during image capture |
JP5867424B2 (ja) * | 2013-02-28 | 2016-02-24 | ソニー株式会社 | 画像処理装置、画像処理方法、プログラム |
KR20150113572A (ko) * | 2014-03-31 | 2015-10-08 | 삼성전자주식회사 | 영상데이터를 획득하는 전자장치 및 방법 |
US9729865B1 (en) | 2014-06-18 | 2017-08-08 | Amazon Technologies, Inc. | Object detection and tracking |
US10027883B1 (en) * | 2014-06-18 | 2018-07-17 | Amazon Technologies, Inc. | Primary user selection for head tracking |
US9977495B2 (en) * | 2014-09-19 | 2018-05-22 | Utherverse Digital Inc. | Immersive displays |
US9684830B2 (en) | 2014-11-14 | 2017-06-20 | Intel Corporation | Automatic target selection for multi-target object tracking |
JP6722878B2 (ja) * | 2015-07-30 | 2020-07-15 | パナソニックIpマネジメント株式会社 | 顔認証装置 |
WO2017081839A1 (ja) * | 2015-11-13 | 2017-05-18 | パナソニックIpマネジメント株式会社 | 移動体追跡方法、移動体追跡装置、およびプログラム |
JP6977624B2 (ja) * | 2018-03-07 | 2021-12-08 | オムロン株式会社 | 物体検出装置、物体検出方法、およびプログラム |
JP7024539B2 (ja) * | 2018-03-22 | 2022-02-24 | カシオ計算機株式会社 | 画像編集装置、画像編集方法、及びプログラム |
CN109191486A (zh) * | 2018-09-12 | 2019-01-11 | 广州粤创富科技有限公司 | 一种宠物图像分割方法及电子设备 |
CN109218612B (zh) * | 2018-09-17 | 2022-04-22 | 东莞市丰展电子科技有限公司 | 一种追踪拍摄系统及拍摄方法 |
JP6579727B1 (ja) * | 2019-02-04 | 2019-09-25 | 株式会社Qoncept | 動体検出装置、動体検出方法、動体検出プログラム |
JP7253440B2 (ja) * | 2019-05-09 | 2023-04-06 | 東芝テック株式会社 | 追跡装置及び情報処理プログラム |
CN111324096B (zh) * | 2020-03-03 | 2021-04-23 | 郑州旭飞光电科技有限公司 | 一种基板玻璃加工、包装信息可追溯系统及方法 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004252748A (ja) | 2003-02-20 | 2004-09-09 | Toshiba Corp | 画像処理方法、画像処理装置 |
JP2006350645A (ja) | 2005-06-15 | 2006-12-28 | Matsushita Electric Ind Co Ltd | 対象物検出装置及びその学習装置 |
JP2007074279A (ja) | 2005-09-06 | 2007-03-22 | Canon Inc | 撮像装置及びその制御方法 |
JP2008206018A (ja) * | 2007-02-22 | 2008-09-04 | Nikon Corp | 撮像装置およびプログラム |
JP2008278458A (ja) * | 2007-03-30 | 2008-11-13 | Casio Comput Co Ltd | 撮像装置、画像表示装置、及びそのプログラム |
JP2009212637A (ja) * | 2008-03-03 | 2009-09-17 | Sanyo Electric Co Ltd | 撮像装置 |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4746295B2 (ja) * | 2003-08-25 | 2011-08-10 | 富士フイルム株式会社 | デジタルカメラおよび撮影方法 |
JP4140567B2 (ja) | 2004-07-14 | 2008-08-27 | 松下電器産業株式会社 | 物体追跡装置および物体追跡方法 |
JP2006115406A (ja) * | 2004-10-18 | 2006-04-27 | Omron Corp | 撮像装置 |
US7403643B2 (en) * | 2006-08-11 | 2008-07-22 | Fotonation Vision Limited | Real-time face tracking in a digital image acquisition device |
JP4264663B2 (ja) * | 2006-11-21 | 2009-05-20 | ソニー株式会社 | 撮影装置、画像処理装置、および、これらにおける画像処理方法ならびに当該方法をコンピュータに実行させるプログラム |
JP4915420B2 (ja) * | 2006-12-11 | 2012-04-11 | 株式会社ニコン | 電子カメラ |
JP2008187591A (ja) * | 2007-01-31 | 2008-08-14 | Fujifilm Corp | 撮像装置及び撮像方法 |
US8615112B2 (en) | 2007-03-30 | 2013-12-24 | Casio Computer Co., Ltd. | Image pickup apparatus equipped with face-recognition function |
JP4636064B2 (ja) * | 2007-09-18 | 2011-02-23 | ソニー株式会社 | 画像処理装置および画像処理方法、並びにプログラム |
US8265474B2 (en) * | 2008-03-19 | 2012-09-11 | Fujinon Corporation | Autofocus system |
KR101009881B1 (ko) * | 2008-07-30 | 2011-01-19 | 삼성전자주식회사 | 재생되는 영상의 타겟 영역을 확대 디스플레이하기 위한장치 및 방법 |
JP5279635B2 (ja) * | 2008-08-20 | 2013-09-04 | キヤノン株式会社 | 画像処理装置、画像処理方法、および、プログラム |
-
2010
- 2010-10-05 JP JP2011535280A patent/JP5399502B2/ja active Active
- 2010-10-05 EP EP10821737.3A patent/EP2355492B1/en active Active
- 2010-10-05 US US13/133,223 patent/US8432357B2/en active Active
- 2010-10-05 WO PCT/JP2010/005956 patent/WO2011043060A1/ja active Application Filing
- 2010-10-05 CN CN2010800034826A patent/CN102239687B/zh active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004252748A (ja) | 2003-02-20 | 2004-09-09 | Toshiba Corp | 画像処理方法、画像処理装置 |
JP2006350645A (ja) | 2005-06-15 | 2006-12-28 | Matsushita Electric Ind Co Ltd | 対象物検出装置及びその学習装置 |
JP2007074279A (ja) | 2005-09-06 | 2007-03-22 | Canon Inc | 撮像装置及びその制御方法 |
JP2008206018A (ja) * | 2007-02-22 | 2008-09-04 | Nikon Corp | 撮像装置およびプログラム |
JP2008278458A (ja) * | 2007-03-30 | 2008-11-13 | Casio Comput Co Ltd | 撮像装置、画像表示装置、及びそのプログラム |
JP2009212637A (ja) * | 2008-03-03 | 2009-09-17 | Sanyo Electric Co Ltd | 撮像装置 |
Non-Patent Citations (2)
Title |
---|
PRMU, vol. 107, no. 206, pages 211 - 224 |
See also references of EP2355492A4 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017216709A (ja) * | 2017-07-13 | 2017-12-07 | 株式会社ニコン | 電子カメラ |
Also Published As
Publication number | Publication date |
---|---|
CN102239687A (zh) | 2011-11-09 |
US20110241991A1 (en) | 2011-10-06 |
EP2355492A4 (en) | 2013-12-04 |
EP2355492B1 (en) | 2018-04-11 |
EP2355492A1 (en) | 2011-08-10 |
JPWO2011043060A1 (ja) | 2013-03-04 |
US8432357B2 (en) | 2013-04-30 |
CN102239687B (zh) | 2013-08-14 |
JP5399502B2 (ja) | 2014-01-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5399502B2 (ja) | 追尾対象選択装置、方法、プログラム及び回路 | |
JP6043856B2 (ja) | Rgbdカメラを用いた頭部ポーズ推定 | |
US9674507B2 (en) | Monocular visual SLAM with general and panorama camera movements | |
US20150235424A1 (en) | Method of image processing for an augmented reality application | |
WO2017126172A1 (ja) | 情報処理装置、情報処理方法、及び記録媒体 | |
WO2018112788A1 (zh) | 图像处理方法及设备 | |
CN106296570B (zh) | 图像处理方法及装置 | |
JP2012103789A (ja) | オブジェクト表示装置及びオブジェクト表示方法 | |
KR101647969B1 (ko) | 사용자 시선을 검출하기 위한 사용자 시선 검출 장치 및 그 방법과, 그 방법을 실행하기 위한 컴퓨터 프로그램 | |
KR101923177B1 (ko) | 사용자 기반의 증강 현실 정보를 제공하기 위한 장치 및 방법 | |
JP2019186955A (ja) | 情報処理システム、情報処理方法及びプログラム | |
US9489715B2 (en) | Image display apparatus and image display method | |
KR20160057867A (ko) | 디스플레이 장치 및 그에 의한 이미지 처리 방법 | |
US20210176404A1 (en) | Electronic apparatus and method for controlling the same | |
EP3629570A2 (en) | Image capturing apparatus and image recording method | |
US11057553B2 (en) | Electronic device for capturing media using a bendable display and method thereof | |
KR101308184B1 (ko) | 윈도우 형태의 증강현실을 제공하는 장치 및 방법 | |
Lin et al. | Large-area, multilayered, and high-resolution visual monitoring using a dual-camera system | |
JP2018189536A (ja) | 画像処理装置、実寸法表示方法、及び実寸法表示処理プログラム | |
JP5383207B2 (ja) | 情報処理装置、制御方法、コンピュータプログラム、および記憶媒体 | |
CN114600162A (zh) | 用于捕捉摄像机图像的场景锁定模式 | |
CN114143442B (zh) | 图像虚化方法、计算机设备、计算机可读存储介质 | |
TWI762830B (zh) | 透過擴增實境給予提示以播放接續影片之系統及方法 | |
JP6525693B2 (ja) | 画像処理装置及び画像処理方法 | |
JP2017098851A (ja) | 表示制御方法、表示制御プログラムおよび情報処理装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080003482.6 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011535280 Country of ref document: JP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10821737 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010821737 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13133223 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |