WO2023282033A1 - Image processing device, image processing method, and program - Google Patents

Image processing device, image processing method, and program Download PDF

Info

Publication number
WO2023282033A1
WO2023282033A1 PCT/JP2022/024400 JP2022024400W WO2023282033A1 WO 2023282033 A1 WO2023282033 A1 WO 2023282033A1 JP 2022024400 W JP2022024400 W JP 2022024400W WO 2023282033 A1 WO2023282033 A1 WO 2023282033A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
images
feature
unit
query
Prior art date
Application number
PCT/JP2022/024400
Other languages
French (fr)
Japanese (ja)
Inventor
拓実 小島
俊介 安木
祐介 加藤
Original Assignee
パナソニックIpマネジメント株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニックIpマネジメント株式会社 filed Critical パナソニックIpマネジメント株式会社
Publication of WO2023282033A1 publication Critical patent/WO2023282033A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Definitions

  • the present disclosure relates to an image processing device, an image processing method, and a program.
  • Patent Literature 1 discloses an image search technique that generates a search query from posture information specified by a user, and searches an image database for images including similar postures according to the search query.
  • An object of the present disclosure is to provide an image processing device, an image processing method, and a program capable of extracting features of an object appearing in an image with higher precision than the conventional technology.
  • One aspect of the present disclosure is a feature extraction unit that extracts a feature amount indicating a feature of an object from a plurality of images in which the same object is captured, and at least two of the plurality of images include: an attribute determination unit that determines an object attribute of an object; and combining the feature amounts of at least two images when the object attributes of the at least two images are different from each other based on the object attributes determined by the attribute determination unit. and a determining unit for determining.
  • Another aspect of the present disclosure is a feature amount extraction step of extracting a feature amount indicating a feature of the object from a plurality of images in which the object is shown in each, and at least two of the plurality of images include an attribute determining step of determining an object attribute of an object, and combining the feature amounts of at least two images when the object attributes of the at least two images are different from each other based on the object attribute determined in the attribute determining step and a determining step of determining.
  • a further aspect of the present disclosure provides a program for causing a control unit to execute the above image processing method.
  • the image processing device According to the image processing device, the image processing method, and the program according to the present disclosure, it is possible to extract the features of an object appearing in an image with higher accuracy than in the prior art.
  • FIG. 4 is a flowchart of image processing executed by the image processing apparatus according to the first embodiment; Schematic diagram for explaining image processing of the image processing apparatus according to the first embodiment.
  • Block diagram showing a configuration example of an image processing apparatus according to a second embodiment Flowchart of image processing executed by the image processing apparatus according to the second embodiment Schematic diagram for explaining image processing of the image processing apparatus according to the second embodiment.
  • Patent Literature 1 discloses an image search technique that generates a search query from posture information specified by a user, and searches an image database for images including similar postures according to the search query.
  • both the query image and the image to be matched are images showing the same person, if the attributes such as orientation, posture, and clothing of the person in both images are different from each other, both the The similarity of the features of the images is calculated to be low, and it may not be possible to recognize that the person is the same person.
  • the query image is a front-facing image in which the face of a certain person can be seen
  • the matching target image is an image showing the same person facing backward
  • the present inventors have conducted research to solve the above problems and have developed an image matching device, an image matching method, and a program that extract the features of an object shown in an image more accurately than the conventional technology.
  • FIG. 1 is a schematic diagram showing an overview of an image processing device 100 according to an embodiment of the present disclosure.
  • An example of the image processing device 100 is an image matching device that matches the query images 50a and 50b with the matching target image 22a.
  • the query images 50a and 50b are, for example, human images detected from captured images generated by a surveillance camera, and the matching target image 22a is, for example, an image showing a person being searched.
  • the image processing device 100 acquires query images 50a and 50b, and detects the orientation and feature amount of a person appearing in each image.
  • the "feature amount” is an amount representing the features of an object such as a person appearing in an image. is represented by a vector quantity including the gradient of
  • the image processing device 100 When the query images 50a and 50b show the same person, but the direction of the person in both images is different, the image processing device 100 combines the feature amounts of both images in which the object is shown. When the feature amount is combined, the image processing apparatus 100 compares the combined feature amount with the feature amount of the matching target image 22a for matching.
  • “combination" of a plurality of feature amounts assumes vector operation. For example, "combining" a plurality of feature amounts includes averaging the plurality of feature amounts, calculating a direct sum of the plurality of feature amounts, and calculating a difference between the plurality of feature amounts.
  • "combination" of a plurality of feature quantities may include weighting processing for increasing and emphasizing the feature when there is a common feature among the plurality of feature quantities.
  • the combined feature amount may have a different number of dimensions from the feature amount to be compared.
  • the shape of the feature amount to be compared may be matched to the combined feature amount. For example, if the number of dimensions is changed by direct addition of two feature values to be combined, the feature value to be compared can be doubled to unify the number of dimensions of the combined feature value and the feature value to be compared. good.
  • the inner product of the combined feature quantity and the feature quantity obtained from the comparison target image is calculated. There is a method in which the objects are the same (eg, the same person), and if they are lower than a certain threshold, they are different (eg, different people).
  • the combined feature quantity contains multifaceted information that sees the person from a different direction than when it is not combined. Therefore, when the query images 50a and 50b and the matching target image 22a all indicate the same person, the feature values of the combined query images 50a and 50b and the matching target image 22a are combined. More matching elements than without it. On the other hand, when the query images 50a and 50b indicate the same person and the person indicated by the matching target image 22a is different, the combined feature amount of the query images 50a and 50b and the feature amount of the matching target image 22a are , the number of matching elements decreases compared to the case where the feature quantities are not combined.
  • the image processing apparatus 100 determines, for example, a degree of similarity indicating the degree of similarity between the two feature amounts, and determines that the same person is shown in both images when the degree of similarity is equal to or greater than a predetermined threshold. do.
  • FIG. 2 is a block diagram showing a configuration example of the image processing apparatus 100 according to the first embodiment of the present disclosure.
  • the image processing apparatus 100 includes a control section 1 , a storage device 2 , an image acquisition section 3 for acquiring image data 50 , an input interface (I/F) 5 and an output interface (I/F) 4 .
  • the control unit 1 implements the functions of the image processing apparatus 100 by executing information processing. Such information processing is realized by executing a program stored in the storage device 2 by the control unit 1, for example.
  • the control unit 1 includes a person detection unit 11, a query determination unit 12, a query tracking unit 13, an attribute determination unit 14, a feature extraction unit 15, a determination unit 16, a feature combination unit 17, and a matching unit 18. including.
  • the control unit 1 is composed of circuits such as a CPU, MPU, and FPGA.
  • the human detector 11 detects a human within the image data 50 .
  • a query determination unit 12 determines a query image. For example, the query determining unit 12 determines one of the human images detected by the human detecting unit 11 as the query image.
  • the query tracking unit 13 tracks the person indicated by the query image determined by the query determination unit 12 in the time-series image data group of the image data 50 .
  • the attribute determination unit 14 detects to which of a plurality of predetermined object attributes the attribute of the object included in the input image, for example, the query image belongs.
  • the feature extraction unit 15 extracts feature amounts from an input image such as a query image.
  • the feature extraction unit 15 may use a feature extraction model 21 that inputs an image and outputs a feature amount vector of the image in order to extract the feature amount.
  • the determination unit 16 determines whether or not the object attribute of the person included in the query image determined by the query determination unit 12 has changed in the time-series image data group.
  • the feature combining unit 17 combines feature amounts of a plurality of images.
  • the matching unit 18 compares the feature amount of the plurality of images combined by the feature combining unit 17 with the feature amount of the other image, thereby identifying the person indicated by the plurality of images and the person included in the other image. match.
  • a detailed example of the function of each component will be described later in relation to the operation of the image processing apparatus 100 .
  • the storage device 2 is a recording medium for recording various information including programs and data for causing the control unit 1 to execute image processing by the image processing device 100 .
  • the storage device 2 stores a later-described feature extraction model 21 that is a trained model, and an image list 22 that includes a matching target image group that is a matching target of the query image.
  • the storage device 2 is realized by, for example, a semiconductor storage device such as a flash memory, a solid state drive (SSD), a magnetic storage device such as a hard disk drive (HDD), or other recording media alone or in combination.
  • the storage device 2 may include volatile memory such as SRAM and DRAM.
  • the storage device 2 may be any of an internal type, an external type, and a NAS (network-attached storage) type.
  • the image acquisition unit 3 is an interface circuit that connects the image processing device 100 and external devices in order to input information such as the image data 50 to the image processing device 100 .
  • Such an external device is, for example, another information processing terminal (not shown) or a device such as a camera that acquires the image data 50 .
  • the image acquisition unit 3 may be a communication circuit that performs data communication according to existing wired communication standards or wireless communication standards.
  • the input interface 5 is an interface circuit that connects the image processing device 100 and an input device 80 such as a keyboard and a mouse in order to receive user input.
  • the input interface 5 may be a communication circuit that performs data communication according to existing wired communication standards or wireless communication standards.
  • the output interface 4 is an interface circuit that connects the image processing device 100 and an external output device in order to output information from the image processing device 100 .
  • Such external output devices include, for example, information processing terminals such as smartphones and tablets, and displays.
  • the output interface 4 may be a communication circuit that is connected to a network and performs data communication according to existing wired communication standards or wireless communication standards.
  • the image acquisition unit 3, input interface 5 and output interface 4 may be realized by separate or common hardware.
  • FIG. 3 is a flowchart of image processing executed by the image processing apparatus 100 .
  • the control unit 1 acquires the image data 50 via the image acquisition unit 3 (S101).
  • the image data 50 is, for example, a group of time-series image data captured by a camera installed in the city, on the premises, or the like.
  • the control unit 1 may sequentially acquire the image data 50 captured by the camera as frames in real time. Alternatively, the image data 50 may be recorded data recorded in advance.
  • the human detection unit 11 detects a person in the image data 50 acquired in step S101 (S102).
  • detecting a person in the image data 50 includes detecting an area where a person exists in the image data 50 and detecting a person image.
  • the query determination unit 12 determines a query image (S103). For example, the query determining unit 12 determines one of the plurality of human images detected in step S102 as the query image.
  • the query determining unit 12 may select a person image selected by the user using the input device 80 such as a keyboard or mouse from among the plurality of person images detected in step S102 as the query image. .
  • the query determination unit 12 may use an image of a person stored in advance in the storage device 2, an image of a person input via the image acquisition unit 3, or the like as a query image.
  • the feature extraction unit 15 extracts feature amounts (referred to as "pre-change feature amounts” in comparison with post-change feature amounts described later) from the query image (S104).
  • the feature extraction unit 15 may use a feature extraction model 21 that inputs an image and outputs a feature amount vector of the image in order to extract the feature amount.
  • a feature extraction model 21 is a trained model constructed by having the model learn the relationship between the learning image and the correct information.
  • the feature extraction model 21, which is a trained model may be a model having the structure of a neural network, for example, a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the attribute determination unit 14 detects to which of a plurality of predetermined object attributes the attribute of the object included in the query image belongs (S105). For example, the attribute determining unit 14 assigns one of a plurality of predetermined object attributes to the query image.
  • An example of a predetermined object attribute is the orientation of the person in the image.
  • the attribute determination unit 14 determines the orientation of the person in the query image by comparing the feature amount vector of the query image output by the feature extraction model 21 with the feature amount vector of the person image having a predetermined orientation. To detect.
  • the orientation of the person is, for example, the orientation of the face of the person in the query image, the orientation of the upper half of the body, the orientation of the lower half of the body, or an orientation determined by combining these pieces of information.
  • the attribute determination unit 14 may use an orientation detection model that outputs the orientation of the person in the human image by inputting the image of the person.
  • an orientation detection model is a trained model constructed by having the model learn the relationship between the learning image and the correct information.
  • a known skeleton detector, posture detector, and face orientation detector may be applied to the attribute determination unit 14 .
  • the direction of the person detected in this way is, for example, 8 directions of forward, obliquely forward to the right, facing right, obliquely backward to the right, backward, obliquely backward to the left, facing left, and obliquely forward to the left when viewed from the person in the candidate image. direction can be classified.
  • the object attribute determined by the attribute determination unit 14 is not limited to the orientation of the person, and may be attributes such as the person's height, body type, hairstyle, and the like. A person's height, body shape, and hairstyle can be easily estimated from the image itself, for example, using well-known image recognition techniques.
  • the object attribute may be an attribute representing whether or not a person is wearing clothes of a specific shape such as a suit.
  • An object attribute may be an attribute representing whether a person is holding a bag, carrying a backpack, pulling a suitcase, making a phone call, and the like.
  • the object attribute may be an attribute indicating whether a person is riding a vehicle such as a bicycle or motorcycle, walking, running, stationary, standing or sitting, and the like.
  • the attribute determination unit 14 may detect the pose of a person in the query image and estimate the object attribute as described above based on the detected pose. Alternatively, the attribute determination unit 14 may use an orientation attribute detection model that outputs an object attribute of a human image by inputting a human image. Such an attribute detection model is a trained model constructed by having the model learn the relationship between the learning image and the correct information.
  • the object attributes determined by the attribute determining unit 14 are not limited to human attributes, and may be attributes of non-human objects.
  • the object attribute may be the color, material, shape, etc. of the object.
  • the determination unit 16 determines whether or not the object attribute of the person included in the query image determined in step S103 has changed in the time-series image data group (S106). For example, the determination unit 16 compares the attribute information included in the object attributes of the person appearing in the query image and the tracking image described later, and if the difference in the attribute information is equal to or greater than a threshold, or if the classification information included in the attribute information is different For example, it is determined that the object attribute has changed. For example, if the person in the query image has a package, the query image has a value of 1 as the attribute information about the package. The information has a value of 0, and the difference between both attribute information is 1.
  • another example of the difference in attribute information is the difference between the direction in which the person in the query image is facing and the direction in which the person in the tracking image is facing, expressed as an angle.
  • the determining unit 16 proceeds to step S108. In this case, it means that the determination unit 16 has determined to combine the pre-change feature amount and the post-change feature amount, which will be described later.
  • the determination unit 16 proceeds to step S107.
  • the query tracking unit 13 searches the time-series image data group of the image data 50 for the person indicated by the query image determined in step S103. (S107). For example, based on the position in the image of a person detected or tracked in a specific frame, which is one of the image data groups, the query tracking unit 13 detects the person in subsequent frames captured after the specific frame. Chase.
  • the person indicated by the query image determined in step S103 and the person indicated by the image tracked in step S107 are the same person, for example, the same identification information (ID ) indicates a person with
  • ID an identification information
  • the feature extraction unit 15 extracts the human feature amount (post-change feature amount) included in the tracking image after the attribute change (S108). ).
  • the feature combining unit 17 combines the pre-change feature amount extracted in step S104 and the post-change feature amount extracted in step S108 (S109).
  • the matching unit 18 compares the feature amounts combined in step S109 with the feature amounts of each of the matching target image groups in the image list 22 to obtain the query image.
  • the person indicated by is matched with the person included in each image in the image list 22 .
  • the matching unit 18 It is determined that the person shown in the query image matches the person shown in each image in the image list 22 .
  • the degree of similarity as described above is calculated, for example, by a predetermined degree of similarity calculation algorithm.
  • the matching unit 18 calculates the degree of similarity based on comparison of feature amount vectors.
  • the predetermined similarity calculation algorithm is such that the smaller the distance, such as the Euclidean distance or the Mahalanobis distance, or the inner product between the feature amount vector combined in step S109 and the feature amount vector of each image in the image list 22, the greater the similarity.
  • This is an algorithm for calculating the degree of similarity so that
  • the predetermined similarity calculation algorithm may be an algorithm that applies a model constructed by metric learning to calculate the distance between a plurality of feature amount vectors.
  • the degree of similarity means, for example, that the larger the value, the higher the degree of matching between the compared two feature amount vectors.
  • the number of dimensions of the combined feature vector differs from that of the comparison target image. may For example, when two feature amount vectors to be combined are directly summed and the number of dimensions is changed compared to before combination, the feature amount vector of the image to be compared is doubled, and the combined feature amount vector and the image to be compared are doubled. The number of dimensions of feature amount vectors may be unified.
  • step S106 if the object attribute of the person included in the query image or the tracking image does not change, the process returns to step S105 via step S107.
  • the control unit 1 does not change the object attribute of the person included in the tracking image even after the process of FIG. 3 is completed. good.
  • the matching unit 18 extracts the feature amount ( By comparing the feature amount before change) with the feature amount of each image in the image list 22, the person indicated by the query image and the person included in each image in the image list 22 may be matched.
  • the query determining unit 12 selects the other image among the plurality of human images detected in step S102 as the query image. It may be determined, and the subsequent processes of steps S103 to S110 may be executed. In this way, the control unit 1 may repeatedly execute the process of FIG. 3 so that the process of FIG. 3 is executed for all persons appearing in the image data 50 .
  • FIG. 4 is a schematic diagram for explaining image processing of the image processing apparatus 100.
  • FIG. FIG. 4 shows time-series image data 50 captured by the camera 6 .
  • the control unit 1 detects a person in the image data 50 (S102), and sets the detected person image as a query image 50c (S103). At the same time or after this, the control unit 1 extracts the pre-change feature amount from the query image 50c (S104), and detects the object attribute of the person included in the query image 50c (S105).
  • the query image 50c has an object attribute of "backward".
  • the control unit 1 tracks the person indicated by the query image 50c in the time-series image data 50, 50d is detected (S107).
  • the control unit 1 detects the object attribute of the person included in the tracking image 50d (S105).
  • the person included in the tracking image 50d has the object attribute of "backward facing". Therefore, the process proceeds to No in step S106. In this manner, the control unit 1 repeats the loop until the object attribute of the person included in the tracking images changes, and further detects the tracking images 50e and 50f.
  • the person included in the tracking image 50e also has a "backward facing” object attribute, while the person included in the tracking image 50f has a "forward facing” object attribute.
  • the control unit 1 determines that the object attribute has changed from “backward facing” to "forward facing” (Yes in S106).
  • the control unit 1 extracts the feature amount of the tracking image 50f as the post-change feature amount (S108), and combines the pre-change feature amount and the post-change feature amount (S109).
  • the control unit 1 compares the query image with each image in the image list 22 by comparing the combined feature amount with each feature amount in the matching target image group in the image list 22 (S110 ).
  • the inner product of the combined feature amount vector and the feature amount vector obtained from the image to be compared is calculated,
  • the objects in the two images are the same (for example, the same person), and if they are lower than a certain threshold, they are different (for example, different persons). In this manner, the image processing apparatus 100 can accurately perform matching using features of a plurality of query images showing the same person.
  • the image processing apparatus 100 includes the image acquisition unit 3, the person detection unit 11 which is an example of the object detection unit, the query determination unit 12, the query tracking unit 13, and the attribute determination unit 14. , a feature extraction unit 15 , a determination unit 16 , a feature combination unit 17 , and a storage device 2 .
  • the image acquisition unit 3 receives image data 50 including time-series image data groups.
  • the person detection unit 11 detects a person, which is an example of a target object, in the image data group.
  • the query determining unit 12 determines an image including one of the plurality of human images detected by the human detecting unit 11 as a query image.
  • the query tracking unit 13 searches and tracks the person indicated by the query image in chronological order in the image data group.
  • the storage device 2 stores an image list 22 including a matching target image group which is a matching target of a person included in a query image.
  • the feature extraction unit 15 extracts feature amounts from the query image determined by the query determination unit 12, the tracking image tracked by the query tracking unit 13, and the matching target image group.
  • the attribute determination unit 14 determines the object attributes of the person included in the query image and the object attribute of the person included in the tracking image. Based on the object attribute determined by the attribute determination unit 14, the determination unit 16 determines whether or not to combine the feature amounts of at least two images of the query image and the tracking image. When the determining unit 16 determines to combine the feature amounts of at least two images of the query image and the tracking image, the feature combining unit 17 combines the feature amounts of the at least two images.
  • the image processing apparatus 100 can combine the features of a plurality of images (query image and tracking image) showing the same person, and extract the features of the person appearing in the image more accurately than in the conventional technology.
  • the determination unit 16 may determine to combine the feature amounts of the at least two images when the at least two images show the same person and the object attributes of the at least two images are different from each other.
  • the image processing apparatus 100 can combine the features of a plurality of images of the same person with different object attributes, and extract the features of the person in the image more accurately and efficiently.
  • the image processing apparatus 100 When the feature combining unit 17 combines the feature amounts of the at least two images, the image processing apparatus 100 combines the combined feature amounts with the images in the image list 22 that are images other than the at least two images.
  • a matching unit 18 that matches the feature amount may be further provided.
  • the image processing apparatus 100 can accurately perform matching using the features of a plurality of images showing the same person.
  • the collating unit 18 determines that the at least two It may be determined that an object shown in one image matches an object shown in an image other than the at least two images.
  • FIG. 5 is a block diagram showing a configuration example of the image processing apparatus 200 according to the second embodiment of the present disclosure.
  • the storage device 2 stores an image list 222 instead of the image list 22 .
  • the image list 222 includes a plurality of images of the same person with different object attributes (see FIG. 7).
  • Each image in image list 222 includes, for example, the ID of the person represented by each image.
  • FIG. 6 is a flowchart illustrating the procedure of image processing executed by the control unit 1 of the image processing apparatus 200 according to the second embodiment.
  • the control unit 1 acquires image data 250 via the image acquisition unit 3 (S201).
  • the image data 250 may be one image, unlike the image data 50 of the first embodiment that includes time-series image data groups.
  • the human detection unit 11 detects a person in the image data 250 acquired in step S201 (S202).
  • the query determination unit 12 determines a query image (S203). For example, the query determining unit 12 determines one of the one or more human images detected in step S202 as the query image.
  • the feature extraction unit 15 extracts the feature amount of each image in the image list 222 (S204).
  • the attribute determining unit 14 detects to which of a plurality of predetermined object attributes the object attribute of a person included in each image of the image list 222 belongs (S205).
  • the determination unit 16 determines whether or not there are a plurality of images showing the same person but belonging to different object attributes in the image list 222 (S206). If the determination unit 16 determines that there are a plurality of images showing the same person but belonging to different object attributes (Yes in S206), the determination unit 16 proceeds to step S207. ) and proceed to step S208. For example, if a plurality of images in the image list 222 are images showing the same person, some of the images show the person facing forward, and other images show the person facing backward, The determination unit 16 determines that there are a plurality of images showing the same person but belonging to different object attributes.
  • the feature combining unit 17 combines feature amounts of a plurality of images showing the same person but belonging to different object attributes (S207). The feature amount of each of the multiple images has already been extracted in step S204.
  • the matching unit 18 compares the feature amounts combined in step S207 with the feature amount of the query image, thereby matching each image in the image list 222 with the query image. is collated (S208). Proceeding to No in step S206, if the feature amount is not combined in step S207, the collation unit 18 compares the feature amount of each image in the image list 222 with the feature amount of the query image to obtain an image. Each image in the list 222 is compared with the query image (S208).
  • FIG. 7 is a schematic diagram for explaining image processing of the image processing apparatus 200 according to the second embodiment.
  • FIG. 7 shows image data 250 captured by camera 6 .
  • the camera 6 is installed, for example, in the premises of a building, factory, or the like.
  • the image list 222 includes a plurality of images of the same person with different object attributes.
  • the image list 222 includes a front-facing image of person X, a back-facing image of person X, a front-facing image of person Y, a back-facing image of person Y, a front-facing image of person Z, and a back-facing image of person Z.
  • shows an example that includes an image of Image list 222 is, for example, an employee image database containing images of employees who may be working on the premises.
  • the control unit 1 detects a person in the image data 250 (S202), and uses the detected person image as a query image (S203).
  • the control unit 1 extracts the feature amount of each image in the image list 222 (S204), and determines the object attribute of each image (S205).
  • the control unit 1 determines that the image list 222 includes a plurality of images showing the same person but belonging to different object attributes (Yes in S206). and the feature amount of the image of the person X facing backward are combined.
  • the feature amounts of the images of persons Y and Z are similarly combined (S207).
  • the control unit 1 compares each image in the image list 222 with the query image by comparing the combined feature amount and the feature amount of the query image.
  • the image processing apparatus 200 can perform matching with high accuracy using the features of a plurality of images showing the same person included in the image list 222 . Further, according to the image processing apparatus 200, since the feature amounts of the plurality of images in the image list 222 are combined, it is possible to avoid comparing the image captured by the camera 6 with all the images in the image list 222. It is possible to reduce the number of comparisons and reduce the amount of computational processing. This also leads to an improvement in processing speed.
  • the image processing apparatus 200 includes the image acquisition unit 3, the person detection unit 11 which is an example of the object detection unit, the query determination unit 12, the attribute determination unit 14, and the feature extraction unit 15. , a determination unit 16 , a feature combining unit 17 , and a storage device 2 .
  • Image acquisition unit 3 receives image data 250 .
  • the person detection unit 11 detects a person, which is an example of a target object, in the image data 250 .
  • the query determining unit 12 determines an image including one of the human images detected by the human detecting unit 11 as a query image.
  • the storage device 2 stores an image list 222 that includes matching target images that are matching targets for people included in the query image.
  • the feature extraction unit 15 extracts the feature amount of each of the matching target image groups.
  • the attribute determination unit 14 determines object attributes of objects included in each of the matching target image groups. Based on the object attribute determined by the attribute determination unit 14, the determination unit 16 determines whether or not to combine the feature amounts of at least two images in the matching target image group.
  • the feature combining unit 17 combines the feature amounts of at least two images when the determining unit 16 determines to combine the feature amounts of at least two images in the matching target image group.
  • the image processing apparatus 200 can combine the features of multiple query images showing the same person and extract the features of the person appearing in the image more accurately than in the conventional technology.
  • the image processing apparatus 200 When the feature combining unit 17 combines the feature amounts of the at least two images, the image processing apparatus 200 combines the combined feature amount with the feature amount of a query image that is an image other than the at least two images.
  • a collation unit 18 for collation may be further provided.
  • the image processing apparatus 200 can accurately perform matching using the features of a plurality of matching target image groups showing the same person.
  • the configuration for matching the query image detected in the image data input from the outside with the matching target image included in the image list in the storage device 2 has been described.
  • this configuration an example of combining at least one of the features of multiple query images and the features of multiple matching target images has been described.
  • the embodiments of the present disclosure are not limited to this, and may be configured to combine feature amounts of at least two images.
  • the feature extraction unit 15 extracts feature amounts from multiple images each representing a target object.
  • the attribute determination unit 14 determines to which of a plurality of predetermined object attributes the object attributes of at least two images out of the plurality of images belong. Based on the object attributes determined by the attribute determination unit 14, the determination unit 16 determines whether or not to combine the feature amounts of the at least two images. For example, the determination unit 16 compares the attribute information included in the object attributes of the at least two images, and determines to combine the feature amounts when the difference in the attribute information is equal to or greater than a threshold.
  • the feature combining unit 17 combines the feature amounts of the at least two images when the determining unit 16 determines to combine the feature amounts of the at least two images.
  • the present disclosure is applicable to image processing technology, such as image search technology and image matching technology.

Abstract

This image processing device comprises a feature extraction unit, an attribute selection unit, and a determination unit. The feature extraction unit extracts a feature value of a target object from a plurality of images in which the same target object is reflected. The attribute selection unit selects an object attribute of the target object included in at least two of the plurality of images. On the basis of the object attribute selected by the attribute selection unit, the determination unit determines to combine the feature values of the at least two images when there is a difference in the object attribute among the at least two images.

Description

画像処理装置、画像処理方法、及びプログラムImage processing device, image processing method, and program
 本開示は、画像処理装置、画像処理方法、及びプログラムに関する。 The present disclosure relates to an image processing device, an image processing method, and a program.
 クエリ画像と照合対象画像とを照合する画像照合技術が知られている。例えば、特許文献1は、ユーザが指定した姿勢情報から検索クエリを生成し、検索クエリに従って類似した姿勢を含む画像を画像データベースから検索する画像検索技術を開示する。 An image matching technique for matching a query image and an image to be matched is known. For example, Patent Literature 1 discloses an image search technique that generates a search query from posture information specified by a user, and searches an image database for images including similar postures according to the search query.
特許第6831769号公報Japanese Patent No. 6831769
 本開示は、画像に映る対象物の特徴を従来技術より精度良く抽出することができる画像処理装置、画像処理方法、及びプログラムを提供することを目的とする。 An object of the present disclosure is to provide an image processing device, an image processing method, and a program capable of extracting features of an object appearing in an image with higher precision than the conventional technology.
 本開示の一態様は、それぞれに同一の対象物が映っている複数の画像から対象物の特徴を示す特徴量を抽出する特徴抽出部と、複数の画像のうちの少なくとも2つの画像に含まれる対象物の物体属性を決定する属性決定部と、属性決定部によって決定された物体属性に基づいて、当該少なくとも2つの画像の物体属性が互いに異なる場合に、少なくとも2つの画像の特徴量を結合すると判定する判定部と、を備える画像処理装置を提供する。 One aspect of the present disclosure is a feature extraction unit that extracts a feature amount indicating a feature of an object from a plurality of images in which the same object is captured, and at least two of the plurality of images include: an attribute determination unit that determines an object attribute of an object; and combining the feature amounts of at least two images when the object attributes of the at least two images are different from each other based on the object attributes determined by the attribute determination unit. and a determining unit for determining.
 本開示の他の態様は、それぞれに対象物が映っている複数の画像から対象物の特徴を示す特徴量を抽出する特徴量抽出ステップと、複数の画像のうちの少なくとも2つの画像に含まれる対象物の物体属性を決定する属性決定ステップと、属性決定ステップにおいて決定された物体属性に基づいて、当該少なくとも2つの画像の物体属性が互いに異なる場合に、少なくとも2つの画像の特徴量を結合すると判定する判定ステップと、を含む画像処理方法を提供する。 Another aspect of the present disclosure is a feature amount extraction step of extracting a feature amount indicating a feature of the object from a plurality of images in which the object is shown in each, and at least two of the plurality of images include an attribute determining step of determining an object attribute of an object, and combining the feature amounts of at least two images when the object attributes of the at least two images are different from each other based on the object attribute determined in the attribute determining step and a determining step of determining.
 本開示の更に他の態様は、上記の画像処理方法を制御部に実行させるためのプログラムを提供する。 A further aspect of the present disclosure provides a program for causing a control unit to execute the above image processing method.
 本開示に係る画像処理装置、画像処理方法、及びプログラムによれば、画像に映る対象物の特徴を従来技術より精度良く抽出することができる。 According to the image processing device, the image processing method, and the program according to the present disclosure, it is possible to extract the features of an object appearing in an image with higher accuracy than in the prior art.
本開示の実施形態に係る画像処理装置の概要を示す模式図Schematic diagram showing an outline of an image processing device according to an embodiment of the present disclosure 第1実施形態に係る画像処理装置の構成例を示すブロック図1 is a block diagram showing a configuration example of an image processing apparatus according to a first embodiment; FIG. 第1実施形態に係る画像処理装置によって実行される画像処理のフローチャート4 is a flowchart of image processing executed by the image processing apparatus according to the first embodiment; 第1実施形態に係る画像処理装置の画像処理を説明するための模式図Schematic diagram for explaining image processing of the image processing apparatus according to the first embodiment. 第2実施形態に係る画像処理装置の構成例を示すブロック図Block diagram showing a configuration example of an image processing apparatus according to a second embodiment 第2実施形態に係る画像処理装置によって実行される画像処理のフローチャートFlowchart of image processing executed by the image processing apparatus according to the second embodiment 第2実施形態に係る画像処理装置の画像処理を説明するための模式図Schematic diagram for explaining image processing of the image processing apparatus according to the second embodiment.
(本開示に至った経緯)
 クエリ画像と照合対象画像とを照合する画像照合技術が知られている。このような画像照合技術は、例えば、街中、構内等に設置された複数の監視カメラによって生成された複数の撮像画像の中から、検索対象人物を探索するために利用される。例えば、特許文献1は、ユーザが指定した姿勢情報から検索クエリを生成し、検索クエリに従って類似した姿勢を含む画像を画像データベースから検索する画像検索技術を開示する。
(Circumstances leading to this disclosure)
An image matching technique for matching a query image and a matching target image is known. Such an image matching technique is used, for example, to search for a person to be searched from among a plurality of captured images generated by a plurality of surveillance cameras installed in towns, premises, and the like. For example, Patent Literature 1 discloses an image search technique that generates a search query from posture information specified by a user, and searches an image database for images including similar postures according to the search query.
 しかしながら、従来の画像照合技術では、クエリ画像及び照合対象画像が共に同一人物を示す画像である場合であっても、両画像における人物の向き、姿勢、服装等の属性が互いに異なるときは、両画像の特徴の類似度が低く算出され、同一人物であることを認識できないことがある。例えば、クエリ画像がある人の顔が見える前向きの画像である一方で、照合対象画像が後ろ向きの同一人物を示す画像である場合、同一人物であることが認識されないことがある。 However, in the conventional image matching technique, even if both the query image and the image to be matched are images showing the same person, if the attributes such as orientation, posture, and clothing of the person in both images are different from each other, both the The similarity of the features of the images is calculated to be low, and it may not be possible to recognize that the person is the same person. For example, if the query image is a front-facing image in which the face of a certain person can be seen, and the matching target image is an image showing the same person facing backward, the identity of the same person may not be recognized.
 本発明者らは、上記課題を解決するために研究を行い、画像に映る対象物の特徴を従来技術より精度良く抽出する画像照合装置、画像照合方法、及びプログラムを開発するに至った。 The present inventors have conducted research to solve the above problems and have developed an image matching device, an image matching method, and a program that extract the features of an object shown in an image more accurately than the conventional technology.
 以下、適宜図面を参照しながら、実施の形態を詳細に説明する。但し、必要以上に詳細な説明は省略する場合がある。例えば、既によく知られた事項の詳細説明や実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。 Hereinafter, embodiments will be described in detail with reference to the drawings as appropriate. However, more detailed description than necessary may be omitted. For example, detailed descriptions of well-known matters and redundant descriptions of substantially the same configurations may be omitted. This is to avoid unnecessary verbosity in the following description and to facilitate understanding by those skilled in the art.
 なお、出願人は、当業者が本開示を十分に理解するために添付図面および以下の説明を提供するのであって、これらによって特許請求の範囲に記載の主題を限定することを意図するものではない。 It is noted that Applicants provide the accompanying drawings and the following description for a full understanding of the present disclosure by those skilled in the art and are not intended to limit the claimed subject matter thereby. Absent.
1.概要
 図1は、本開示の実施形態に係る画像処理装置100の概要を示す模式図である。画像処理装置100の一例は、クエリ画像50a,50bと照合対象画像22aとを照合する画像照合装置である。クエリ画像50a,50bは、例えば監視カメラによって生成された撮像画像から検出された人物画像であり、照合対象画像22aは、例えば捜索中の人物を示す画像である。
1. Overview FIG. 1 is a schematic diagram showing an overview of an image processing device 100 according to an embodiment of the present disclosure. An example of the image processing device 100 is an image matching device that matches the query images 50a and 50b with the matching target image 22a. The query images 50a and 50b are, for example, human images detected from captured images generated by a surveillance camera, and the matching target image 22a is, for example, an image showing a person being searched.
 図1において、画像処理装置100は、クエリ画像50a,50bを取得し、各画像に映る人の向き及び特徴量を検出する。ここで、「特徴量」は、画像に映る人等の対象物の特徴を表す量であり、例えば、各画素の色、色相、輝度、明度等の画素値、及び隣接する画素間における画素値の勾配等を含むベクトル量で表される。 In FIG. 1, the image processing device 100 acquires query images 50a and 50b, and detects the orientation and feature amount of a person appearing in each image. Here, the "feature amount" is an amount representing the features of an object such as a person appearing in an image. is represented by a vector quantity including the gradient of
 画像処理装置100は、クエリ画像50a,50bが同一の人物を示す一方で、両画像における人の向きが互いに異なる場合に、対象物が映る両画像の特徴量を結合する。特徴量の結合が実行された場合、画像処理装置100は、結合された特徴量と、照合対象画像22aの特徴量とを比較して照合する。ここで、複数の特徴量の「結合」とは、ベクトル演算を想定している。例えば、複数の特徴量の「結合」は、複数の特徴量を平均すること、複数の特徴量の直和を算出すること、複数の特徴量の差分を算出することを含む。また、複数の特徴量の「結合」は、複数の特徴量の間に共通の特徴がある場合に、当該特徴を増加させて強調する重み付加処理を含んでもよい。結合された特徴量は比較対象の特徴量と次元数が異なる可能性があるが、その場合比較対象の特徴量の形状を結合された特徴量に合わせてもよい。例えば結合対象となる2つの特徴量を直和して次元数が変わった場合、比較対象の特徴量を2倍にし、結合された特徴量と比較対象の特徴量の次元数を統一してもよい。結合された特徴量と比較対象の特徴量の照合方法としては、例えば結合された特徴量と比較対象画像から得た特徴量の内積を算出し、ある閾値以上の場合は2つの画像内の対象物は同一のもの(例えば同一人物)、ある閾値より低い場合は異なるもの(例えば別人)とする方法がある。 When the query images 50a and 50b show the same person, but the direction of the person in both images is different, the image processing device 100 combines the feature amounts of both images in which the object is shown. When the feature amount is combined, the image processing apparatus 100 compares the combined feature amount with the feature amount of the matching target image 22a for matching. Here, "combination" of a plurality of feature amounts assumes vector operation. For example, "combining" a plurality of feature amounts includes averaging the plurality of feature amounts, calculating a direct sum of the plurality of feature amounts, and calculating a difference between the plurality of feature amounts. Also, "combination" of a plurality of feature quantities may include weighting processing for increasing and emphasizing the feature when there is a common feature among the plurality of feature quantities. The combined feature amount may have a different number of dimensions from the feature amount to be compared. In this case, the shape of the feature amount to be compared may be matched to the combined feature amount. For example, if the number of dimensions is changed by direct addition of two feature values to be combined, the feature value to be compared can be doubled to unify the number of dimensions of the combined feature value and the feature value to be compared. good. As a method of matching the combined feature quantity and the feature quantity to be compared, for example, the inner product of the combined feature quantity and the feature quantity obtained from the comparison target image is calculated. There is a method in which the objects are the same (eg, the same person), and if they are lower than a certain threshold, they are different (eg, different people).
 結合された特徴量は、結合されない場合に比べて、当該人物を異なる向きから見た多面的な情報を含む。したがって、クエリ画像50a,50b及び照合対象画像22aがすべて同一の人物を示す場合、結合されたクエリ画像50a,50bの特徴量と、照合対象画像22aの特徴量との間では、特徴量を結合しない場合に比べて、一致する要素が増える。これに対して、クエリ画像50a,50bが同一の人物を示す一方で、照合対象画像22aが示す人物が異なる場合、結合されたクエリ画像50a,50bの特徴量と、照合対象画像22aの特徴量との間では、特徴量を結合しない場合に比べて、一致する要素が減る。このように、向きが異なる同一人物を示す複数の画像の特徴量を結合することにより、クエリ画像50a,50bと照合対象画像22aとの照合の精度が向上する。照合においては、画像処理装置100は、例えば、両特徴量の類似の程度を示す類似度を決定し、類似度が所定の閾値以上である場合に、両画像に同一人物が映っていると判断する。 The combined feature quantity contains multifaceted information that sees the person from a different direction than when it is not combined. Therefore, when the query images 50a and 50b and the matching target image 22a all indicate the same person, the feature values of the combined query images 50a and 50b and the matching target image 22a are combined. More matching elements than without it. On the other hand, when the query images 50a and 50b indicate the same person and the person indicated by the matching target image 22a is different, the combined feature amount of the query images 50a and 50b and the feature amount of the matching target image 22a are , the number of matching elements decreases compared to the case where the feature quantities are not combined. In this way, by combining the feature amounts of a plurality of images showing the same person in different orientations, the accuracy of matching between the query images 50a and 50b and the matching target image 22a is improved. In matching, the image processing apparatus 100 determines, for example, a degree of similarity indicating the degree of similarity between the two feature amounts, and determines that the same person is shown in both images when the degree of similarity is equal to or greater than a predetermined threshold. do.
2.第1実施形態
2-1.構成
 図2は、本開示の第1実施形態に係る画像処理装置100の構成例を示すブロック図である。画像処理装置100は、制御部1と、記憶装置2と、画像データ50を取得する画像取得部3と、入力インタフェース(I/F)5と、出力インタフェース(I/F)4とを備える。
2. First Embodiment 2-1. Configuration FIG. 2 is a block diagram showing a configuration example of the image processing apparatus 100 according to the first embodiment of the present disclosure. The image processing apparatus 100 includes a control section 1 , a storage device 2 , an image acquisition section 3 for acquiring image data 50 , an input interface (I/F) 5 and an output interface (I/F) 4 .
 制御部1は、情報処理を実行して画像処理装置100の機能を実現する。このような情報処理は、例えば、制御部1が記憶装置2に格納されたプログラムを実行することにより実現される。制御部1は、人検出部11と、クエリ決定部12と、クエリ追跡部13と、属性決定部14と、特徴抽出部15と、判定部16と、特徴結合部17と、照合部18とを含む。制御部1は、CPU、MPU、FPGA等の回路で構成される。 The control unit 1 implements the functions of the image processing apparatus 100 by executing information processing. Such information processing is realized by executing a program stored in the storage device 2 by the control unit 1, for example. The control unit 1 includes a person detection unit 11, a query determination unit 12, a query tracking unit 13, an attribute determination unit 14, a feature extraction unit 15, a determination unit 16, a feature combination unit 17, and a matching unit 18. including. The control unit 1 is composed of circuits such as a CPU, MPU, and FPGA.
 以下、制御部1の各構成要素の機能の一例について説明する。人検出部11は、画像データ50内で人を検出する。クエリ決定部12は、クエリ画像を決定する。例えば、クエリ決定部12は、人検出部11によって検出された複数の人画像のうちの1つをクエリ画像に決定する。クエリ追跡部13は、画像データ50の時系列の画像データ群において、クエリ決定部12によって決定されたクエリ画像が示す人を追跡する。属性決定部14は、入力された画像、例えばクエリ画像に含まれる対象物の属性が予め定められた複数の物体属性のいずれに属するかを検出する。特徴抽出部15は、入力された画像、例えばクエリ画像から特徴量を抽出する。特徴抽出部15は、特徴量を抽出するために、画像を入力することにより当該画像の特徴量ベクトルを出力する特徴抽出モデル21を利用してもよい。判定部16は、クエリ決定部12によって決定されたクエリ画像に含まれる人の物体属性が、時系列の画像データ群において変化したか否かを判定する。特徴結合部17は、複数の画像の特徴量を結合する。照合部18は、特徴結合部17によって結合された複数の画像の特徴量と、他の画像の特徴量とを比較することにより、複数の画像が示す人と他の画像に含まれる人とを照合する。上記各構成要素の機能の詳細な例については、画像処理装置100の動作に関連して後述する。 An example of the function of each component of the control unit 1 will be described below. The human detector 11 detects a human within the image data 50 . A query determination unit 12 determines a query image. For example, the query determining unit 12 determines one of the human images detected by the human detecting unit 11 as the query image. The query tracking unit 13 tracks the person indicated by the query image determined by the query determination unit 12 in the time-series image data group of the image data 50 . The attribute determination unit 14 detects to which of a plurality of predetermined object attributes the attribute of the object included in the input image, for example, the query image belongs. The feature extraction unit 15 extracts feature amounts from an input image such as a query image. The feature extraction unit 15 may use a feature extraction model 21 that inputs an image and outputs a feature amount vector of the image in order to extract the feature amount. The determination unit 16 determines whether or not the object attribute of the person included in the query image determined by the query determination unit 12 has changed in the time-series image data group. The feature combining unit 17 combines feature amounts of a plurality of images. The matching unit 18 compares the feature amount of the plurality of images combined by the feature combining unit 17 with the feature amount of the other image, thereby identifying the person indicated by the plurality of images and the person included in the other image. match. A detailed example of the function of each component will be described later in relation to the operation of the image processing apparatus 100 .
 記憶装置2は、画像処理装置100による画像処理を制御部1に実行させるためのプログラム及びデータを含む種々の情報を記録する記録媒体である。例えば、記憶装置2は、学習済みモデルである後述の特徴抽出モデル21と、クエリ画像の照合対象である照合対象画像群を含む画像リスト22を格納する。記憶装置2は、例えば、フラッシュメモリ、ソリッド・ステート・ドライブ(SSD)等の半導体記憶装置、ハードディスクドライブ(HDD)等の磁気記憶装置、その他の記録媒体単独で又はそれらを組み合わせて実現される。記憶装置2は、SRAM、DRAM等の揮発性メモリを含んでもよい。記憶装置2は、内蔵型、外付け型、およびNAS(network-attached storage)型のいずれであってもよい。 The storage device 2 is a recording medium for recording various information including programs and data for causing the control unit 1 to execute image processing by the image processing device 100 . For example, the storage device 2 stores a later-described feature extraction model 21 that is a trained model, and an image list 22 that includes a matching target image group that is a matching target of the query image. The storage device 2 is realized by, for example, a semiconductor storage device such as a flash memory, a solid state drive (SSD), a magnetic storage device such as a hard disk drive (HDD), or other recording media alone or in combination. The storage device 2 may include volatile memory such as SRAM and DRAM. The storage device 2 may be any of an internal type, an external type, and a NAS (network-attached storage) type.
 画像取得部3は、画像データ50等の情報を画像処理装置100に入力するために、画像処理装置100と外部機器とを接続するインタフェース回路である。このような外部機器は、例えば、図示しない他の情報処理端末、画像データ50を取得するカメラ等の装置である。画像取得部3は、既存の有線通信規格又は無線通信規格に従ってデータ通信を行う通信回路であってもよい。 The image acquisition unit 3 is an interface circuit that connects the image processing device 100 and external devices in order to input information such as the image data 50 to the image processing device 100 . Such an external device is, for example, another information processing terminal (not shown) or a device such as a camera that acquires the image data 50 . The image acquisition unit 3 may be a communication circuit that performs data communication according to existing wired communication standards or wireless communication standards.
 入力インタフェース5は、ユーザ入力を受け付けるために、画像処理装置100とキーボード、マウス等の入力装置80とを接続するインタフェース回路である。入力インタフェース5は、既存の有線通信規格又は無線通信規格に従ってデータ通信を行う通信回路であってもよい。 The input interface 5 is an interface circuit that connects the image processing device 100 and an input device 80 such as a keyboard and a mouse in order to receive user input. The input interface 5 may be a communication circuit that performs data communication according to existing wired communication standards or wireless communication standards.
 出力インタフェース4は、画像処理装置100から情報を出力するために、画像処理装置100と外部の出力装置とを接続するインタフェース回路である。このような外部の出力装置は、例えば、スマートフォン、タブレット等の情報処理端末、及びディスプレイを含む。出力インタフェース4は、既存の有線通信規格又は無線通信規格に従ってネットワークに接続されてデータ通信を行う通信回路であってもよい。画像取得部3、入力インタフェース5及び出力インタフェース4は、別個の、又は共通のハードウェアにより実現されてもよい。 The output interface 4 is an interface circuit that connects the image processing device 100 and an external output device in order to output information from the image processing device 100 . Such external output devices include, for example, information processing terminals such as smartphones and tablets, and displays. The output interface 4 may be a communication circuit that is connected to a network and performs data communication according to existing wired communication standards or wireless communication standards. The image acquisition unit 3, input interface 5 and output interface 4 may be realized by separate or common hardware.
2-2.動作
 図3は、画像処理装置100によって実行される画像処理のフローチャートである。
2-2. Operation FIG. 3 is a flowchart of image processing executed by the image processing apparatus 100 .
 制御部1は、画像取得部3を介して、画像データ50を取得する(S101)。画像データ50は、例えば、街中、構内等に設置されたカメラによって撮像された時系列の画像データ群である。制御部1は、カメラによって撮像された画像データ50を、順次、フレームとしてリアルタイムに取得してもよい。あるいは、画像データ50は、予め記録された録画データであってもよい。 The control unit 1 acquires the image data 50 via the image acquisition unit 3 (S101). The image data 50 is, for example, a group of time-series image data captured by a camera installed in the city, on the premises, or the like. The control unit 1 may sequentially acquire the image data 50 captured by the camera as frames in real time. Alternatively, the image data 50 may be recorded data recorded in advance.
 人検出部11は、ステップS101で取得された画像データ50内で人を検出する(S102)。ここで、画像データ50内で人を検出するとは、画像データ50内で人が存在する領域を検出し、人画像を検出することを含む。 The human detection unit 11 detects a person in the image data 50 acquired in step S101 (S102). Here, detecting a person in the image data 50 includes detecting an area where a person exists in the image data 50 and detecting a person image.
 クエリ決定部12は、クエリ画像を決定する(S103)。例えば、クエリ決定部12は、ステップS102で検出された複数の人画像のうちの1つをクエリ画像に決定する。 The query determination unit 12 determines a query image (S103). For example, the query determining unit 12 determines one of the plurality of human images detected in step S102 as the query image.
 あるいは、ステップS103において、クエリ決定部12は、ステップS102で検出された複数の人画像の中から、ユーザがキーボード、マウス等の入力装置80を用いて選択した人画像を、クエリ画像としてもよい。クエリ決定部12は、記憶装置2に予め格納された人の画像、画像取得部3を介して入力された人の画像等をクエリ画像としてもよい。 Alternatively, in step S103, the query determining unit 12 may select a person image selected by the user using the input device 80 such as a keyboard or mouse from among the plurality of person images detected in step S102 as the query image. . The query determination unit 12 may use an image of a person stored in advance in the storage device 2, an image of a person input via the image acquisition unit 3, or the like as a query image.
 特徴抽出部15は、クエリ画像から特徴量(後述の変化後特徴量との比較において、「変化前特徴量」という。)を抽出する(S104)。特徴抽出部15は、特徴量を抽出するために、画像を入力することにより当該画像の特徴量ベクトルを出力する特徴抽出モデル21を利用してもよい。このような特徴抽出モデル21は、学習用画像と、正解情報との関係をモデルに学習させることによって構築される学習済みモデルである。学習済みモデルである特徴抽出モデル21は、ニューラルネットワーク、例えば畳み込みニューラルネットワーク(Convolutional Neural Network、CNN)の構造を有するモデルであってもよい。CNNのようなモデルを学習させることによって特徴抽出モデル21を構築する場合、特徴抽出モデル21においては、畳み込み層又はプーリング層の出力を特徴量として使用することができる。そのため、特徴抽出モデル21は、モデル最後段の全結合層を除いたものであってもよい。 The feature extraction unit 15 extracts feature amounts (referred to as "pre-change feature amounts" in comparison with post-change feature amounts described later) from the query image (S104). The feature extraction unit 15 may use a feature extraction model 21 that inputs an image and outputs a feature amount vector of the image in order to extract the feature amount. Such a feature extraction model 21 is a trained model constructed by having the model learn the relationship between the learning image and the correct information. The feature extraction model 21, which is a trained model, may be a model having the structure of a neural network, for example, a convolutional neural network (CNN). When constructing the feature extraction model 21 by learning a model such as CNN, in the feature extraction model 21, the output of the convolutional layer or the pooling layer can be used as the feature amount. Therefore, the feature extraction model 21 may be one without the fully connected layer at the last stage of the model.
 属性決定部14は、クエリ画像に含まれる対象物の属性が予め定められた複数の物体属性のいずれに属するかを検出する(S105)。例えば、属性決定部14は、クエリ画像に、予め定められた複数の物体属性のいずれかを割り当てる。 The attribute determination unit 14 detects to which of a plurality of predetermined object attributes the attribute of the object included in the query image belongs (S105). For example, the attribute determining unit 14 assigns one of a plurality of predetermined object attributes to the query image.
 予め定められた物体属性の一例は、画像に映る人の向きである。この場合、属性決定部14は、特徴抽出モデル21によって出力されたクエリ画像の特徴量ベクトルを、予め定められた向きの人画像の特徴量ベクトルと比較することによって、クエリ画像における人の向きを検出する。人の向きは、例えば、クエリ画像に映った人の顔の向き、上半身の向き、下半身の向き、又はこれらの情報を組み合わせることによって決定される向きである。 An example of a predetermined object attribute is the orientation of the person in the image. In this case, the attribute determination unit 14 determines the orientation of the person in the query image by comparing the feature amount vector of the query image output by the feature extraction model 21 with the feature amount vector of the person image having a predetermined orientation. To detect. The orientation of the person is, for example, the orientation of the face of the person in the query image, the orientation of the upper half of the body, the orientation of the lower half of the body, or an orientation determined by combining these pieces of information.
 属性決定部14は、物体属性として人の向きを決定する場合には、人画像を入力することにより当該人画像における人の向きを出力する向き検出モデルを利用してもよい。このような向き検出モデルは、学習用画像と、正解情報との関係をモデルに学習させることによって構築される学習済みモデルである。属性決定部14には、公知の骨格検出器、姿勢検出器、顔向き検出器が適用されてもよい。 When determining the orientation of a person as an object attribute, the attribute determination unit 14 may use an orientation detection model that outputs the orientation of the person in the human image by inputting the image of the person. Such an orientation detection model is a trained model constructed by having the model learn the relationship between the learning image and the correct information. A known skeleton detector, posture detector, and face orientation detector may be applied to the attribute determination unit 14 .
 このようにして検出された人の向きは、例えば、候補画像に映った人から見て、前向き、右斜め前向き、右向き、右斜め後ろ向き、後ろ向き、左斜め後ろ向き、左向き、左斜め前向き、の8方向に分類できる。 The direction of the person detected in this way is, for example, 8 directions of forward, obliquely forward to the right, facing right, obliquely backward to the right, backward, obliquely backward to the left, facing left, and obliquely forward to the left when viewed from the person in the candidate image. direction can be classified.
 また、属性決定部14によって決定される物体属性は人の向きに限定されず、例えば人の身長、体型、髪型等の属性であってもよい。人の身長、体型及び髪型は、例えば周知の画像認識技術を用いて画像自体から容易に推定することができる。また、物体属性は、人がスーツ等の特定の形状の衣服を着用しているか否かを表す属性であってもよい。物体属性は、人が鞄を手に持っているか、バックパックを背負っているか、スーツケースを引いているか、電話をしているか等を表す属性であってもよい。さらに、物体属性は、人が自転車、バイク等の乗り物に乗っているか、歩いているか走っているか静止しているか、立っているか座っているか等を示す属性であってもよい。 Also, the object attribute determined by the attribute determination unit 14 is not limited to the orientation of the person, and may be attributes such as the person's height, body type, hairstyle, and the like. A person's height, body shape, and hairstyle can be easily estimated from the image itself, for example, using well-known image recognition techniques. Also, the object attribute may be an attribute representing whether or not a person is wearing clothes of a specific shape such as a suit. An object attribute may be an attribute representing whether a person is holding a bag, carrying a backpack, pulling a suitcase, making a phone call, and the like. Furthermore, the object attribute may be an attribute indicating whether a person is riding a vehicle such as a bicycle or motorcycle, walking, running, stationary, standing or sitting, and the like.
 属性決定部14は、クエリ画像における人の姿勢を検出し、検出された姿勢に基づいて上記のような物体属性を推定してもよい。あるいは、属性決定部14は、人画像を入力することにより当該人画像の物体属性を出力する向き属性検出モデルを利用してもよい。このような属性検出モデルは、学習用画像と、正解情報との関係をモデルに学習させることによって構築される学習済みモデルである。 The attribute determination unit 14 may detect the pose of a person in the query image and estimate the object attribute as described above based on the detected pose. Alternatively, the attribute determination unit 14 may use an orientation attribute detection model that outputs an object attribute of a human image by inputting a human image. Such an attribute detection model is a trained model constructed by having the model learn the relationship between the learning image and the correct information.
 属性決定部14によって決定される物体属性は、人の属性に限定されず、人以外の物体の属性であってもよい。例えば、物体属性は、物体の色、材質、形状等であってもよい。 The object attributes determined by the attribute determining unit 14 are not limited to human attributes, and may be attributes of non-human objects. For example, the object attribute may be the color, material, shape, etc. of the object.
 判定部16は、ステップS103で決定されたクエリ画像に含まれる人の物体属性が、時系列の画像データ群において変化したか否かを判定する(S106)。例えば、判定部16は、クエリ画像及び後述の追跡画像に映る人の物体属性に含まれる属性情報を比較し、属性情報の差が閾値以上である場合や属性情報に含まれる分類情報が異なる場合等に、物体属性が変化したと判定する。例えば、クエリ画像に映る人が荷物を持っている場合、クエリ画像は荷物に関する属性情報として1の値を有する一方で、追跡画像に映る人が荷物を持っていない場合、追跡画像は荷物に関する属性情報として0の値を有し、両属性情報の差は1となる。また、例えば、属性情報の差の他の例は、クエリ画像に映る人が向いている方向と、追跡画像に映る人が向いている方向の差を角度で表したものである。判定部16は、物体属性が変化したと判定した場合(S106でYes)、ステップS108に進む。この場合は、判定部16が、変化前特徴量と後述の変化後特徴量とを結合すると判定したことを意味する。判定部16は、物体属性が変化しなかったと判定した場合(S106でNo)、ステップS107に進む。 The determination unit 16 determines whether or not the object attribute of the person included in the query image determined in step S103 has changed in the time-series image data group (S106). For example, the determination unit 16 compares the attribute information included in the object attributes of the person appearing in the query image and the tracking image described later, and if the difference in the attribute information is equal to or greater than a threshold, or if the classification information included in the attribute information is different For example, it is determined that the object attribute has changed. For example, if the person in the query image has a package, the query image has a value of 1 as the attribute information about the package. The information has a value of 0, and the difference between both attribute information is 1. Further, for example, another example of the difference in attribute information is the difference between the direction in which the person in the query image is facing and the direction in which the person in the tracking image is facing, expressed as an angle. When determining that the object attribute has changed (Yes in S106), the determining unit 16 proceeds to step S108. In this case, it means that the determination unit 16 has determined to combine the pre-change feature amount and the post-change feature amount, which will be described later. When determining that the object attribute has not changed (No in S106), the determination unit 16 proceeds to step S107.
 判定部16が、物体属性が変化しなかったと判定した場合(S106でNo)、クエリ追跡部13は、画像データ50の時系列の画像データ群において、ステップS103で決定されたクエリ画像が示す人を追跡する(S107)。例えば、クエリ追跡部13は、画像データ群の1つである特定のフレームにおいて検出又は追跡された人の画像内の位置に基づいて、当該特定のフレームより後に撮像された後続フレームにおいて当該人を追跡する。したがって、ステップS103で決定されたクエリ画像が示す人と、ステップS107において追跡された画像(本明細書において、「追跡画像」という。)が示す人は、同一人物、例えば同一の識別情報(ID)を有する人物を示す。人等の対象物の追跡は、例えば、特定のフレーム内の対象物をテンプレートとして記憶装置2に格納し、当該テンプレートを用いて、例えば公知のテンプレートマッチングなどの手法を適用し、後続フレームの中を探索することで実現することができる。属性決定部14は、追跡画像についても、その物体属性を随時決定する。ステップS107の追跡処理を終えると、ステップS105に戻る。 If the determination unit 16 determines that the object attribute has not changed (No in S106), the query tracking unit 13 searches the time-series image data group of the image data 50 for the person indicated by the query image determined in step S103. (S107). For example, based on the position in the image of a person detected or tracked in a specific frame, which is one of the image data groups, the query tracking unit 13 detects the person in subsequent frames captured after the specific frame. Chase. Therefore, the person indicated by the query image determined in step S103 and the person indicated by the image tracked in step S107 (herein referred to as "tracking image") are the same person, for example, the same identification information (ID ) indicates a person with To track an object such as a person, for example, an object in a specific frame is stored as a template in the storage device 2, and using the template, for example, a technique such as known template matching is applied. This can be achieved by searching for The attribute determining unit 14 also determines the object attribute of the tracking image as needed. After completing the tracking process in step S107, the process returns to step S105.
 判定部16が、物体属性が変化したと判定した場合(S106でYes)、特徴抽出部15は、属性変化後の追跡画像に含まれる人の特徴量(変化後特徴量)を抽出する(S108)。 When the determination unit 16 determines that the object attribute has changed (Yes in S106), the feature extraction unit 15 extracts the human feature amount (post-change feature amount) included in the tracking image after the attribute change (S108). ).
 特徴結合部17は、ステップS104で抽出された変化前特徴量と、ステップS108で抽出された変化後特徴量とを結合する(S109)。 The feature combining unit 17 combines the pre-change feature amount extracted in step S104 and the post-change feature amount extracted in step S108 (S109).
 照合部18は、ステップS109で特徴量が結合された場合には、ステップS109で結合された特徴量と、画像リスト22の照合対象画像群のそれぞれの特徴量とを比較することにより、クエリ画像が示す人と画像リスト22の各画像に含まれる人とを照合する。具体的には、照合部18は、例えば、ステップS109で結合された特徴量と、画像リスト22の各画像の特徴量との類似の程度を示す類似度が所定の閾値以上である場合に、クエリ画像に示された人物と画像リスト22の各画像に示された人物とが一致すると判断する。 When the feature amounts are combined in step S109, the matching unit 18 compares the feature amounts combined in step S109 with the feature amounts of each of the matching target image groups in the image list 22 to obtain the query image. The person indicated by is matched with the person included in each image in the image list 22 . Specifically, for example, if the degree of similarity between the feature amount combined in step S109 and the feature amount of each image in the image list 22 is equal to or greater than a predetermined threshold, the matching unit 18 It is determined that the person shown in the query image matches the person shown in each image in the image list 22 .
 上記のような類似度は、例えば、所定の類似度算出アルゴリズムにより算出される。例えば、照合部18は、特徴量ベクトルの比較に基づいて、類似度を算出する。所定の類似度算出アルゴリズムは、例えば、ステップS109で結合された特徴量ベクトルと画像リスト22の各画像の特徴量ベクトルとのユークリッド距離、マハラノビス距離等の距離、又は内積が小さい程類似度が大きくなるように、類似度を算出するアルゴリズムである。所定の類似度算出アルゴリズムは、距離学習(Metric Learning)により構築されたモデルを適用して複数の特徴量ベクトル間の距離を算出するアルゴリズムであってもよい。類似度は、例えば、値が大きいほど比較した2つの特徴量ベクトルの一致度が高いことを意味する。結合された特徴量ベクトルは、比較対象画像の特徴量ベクトルと次元数が異なる可能性があるが、その場合は比較対象画像の特徴量ベクトルの構成を結合された特徴量ベクトルに合わせて調整してもよい。例えば結合対象となる2つの特徴量ベクトルが直和されて結合前と比べて次元数が変わった場合、比較対象画像の特徴量ベクトルを2倍にし、結合された特徴量ベクトルと比較対象画像の特徴量ベクトルの次元数を統一してもよい。 The degree of similarity as described above is calculated, for example, by a predetermined degree of similarity calculation algorithm. For example, the matching unit 18 calculates the degree of similarity based on comparison of feature amount vectors. The predetermined similarity calculation algorithm is such that the smaller the distance, such as the Euclidean distance or the Mahalanobis distance, or the inner product between the feature amount vector combined in step S109 and the feature amount vector of each image in the image list 22, the greater the similarity. This is an algorithm for calculating the degree of similarity so that The predetermined similarity calculation algorithm may be an algorithm that applies a model constructed by metric learning to calculate the distance between a plurality of feature amount vectors. The degree of similarity means, for example, that the larger the value, the higher the degree of matching between the compared two feature amount vectors. There is a possibility that the number of dimensions of the combined feature vector differs from that of the comparison target image. may For example, when two feature amount vectors to be combined are directly summed and the number of dimensions is changed compared to before combination, the feature amount vector of the image to be compared is doubled, and the combined feature amount vector and the image to be compared are doubled. The number of dimensions of feature amount vectors may be unified.
 上記では、ステップS106では、クエリ画像又は追跡画像に含まれる人の物体属性が変化しない場合、ステップS107を経由してステップS105に戻ることを説明した。しかしながら、制御部1は、時系列の画像データ群のすべてについて追跡処理を完了した場合は、追跡画像に含まれる人の物体属性が変化しないときであっても、図3の処理を終えてもよい。あるいは、時系列の画像データ群のすべてについて追跡処理を完了しても追跡画像に含まれる人の物体属性が変化しないときは、照合部18は、ステップS104で抽出されたクエリ画像の特徴量(変化前特徴量)と、画像リスト22の各画像の特徴量とを比較することにより、クエリ画像が示す人と画像リスト22の各画像に含まれる人とを照合してもよい。 It has been described above that in step S106, if the object attribute of the person included in the query image or the tracking image does not change, the process returns to step S105 via step S107. However, when the tracking process is completed for all of the time-series image data group, the control unit 1 does not change the object attribute of the person included in the tracking image even after the process of FIG. 3 is completed. good. Alternatively, when the object attribute of the person included in the tracking image does not change even after the tracking processing for all the time-series image data groups is completed, the matching unit 18 extracts the feature amount ( By comparing the feature amount before change) with the feature amount of each image in the image list 22, the person indicated by the query image and the person included in each image in the image list 22 may be matched.
 なお、ステップS103で決定されたクエリ画像について後続のステップS104~S110の処理を実行した後、クエリ決定部12は、ステップS102で検出された複数の人画像のうちの他の画像をクエリ画像に決定し、更に後続のステップS103~S110の処理を実行してもよい。このように、制御部1は、画像データ50に映る人のすべてについて図3の処理が実行されるように、図3の処理を繰り返し実行してもよい。 Note that after executing the subsequent steps S104 to S110 for the query image determined in step S103, the query determining unit 12 selects the other image among the plurality of human images detected in step S102 as the query image. It may be determined, and the subsequent processes of steps S103 to S110 may be executed. In this way, the control unit 1 may repeatedly execute the process of FIG. 3 so that the process of FIG. 3 is executed for all persons appearing in the image data 50 .
 図4は、画像処理装置100の画像処理を説明するための模式図である。図4は、カメラ6によって撮像された時系列の画像データ50を示す。制御部1は、画像データ50内で人を検出し(S102)、検出した人画像をクエリ画像50cとする(S103)。これと同時又はこの後に、制御部1は、クエリ画像50cから変化前特徴量を抽出し(S104)、クエリ画像50cに含まれる人の物体属性を検出する(S105)。図4の例では、クエリ画像50cは、「後ろ向き」の物体属性を有する。 FIG. 4 is a schematic diagram for explaining image processing of the image processing apparatus 100. FIG. FIG. 4 shows time-series image data 50 captured by the camera 6 . The control unit 1 detects a person in the image data 50 (S102), and sets the detected person image as a query image 50c (S103). At the same time or after this, the control unit 1 extracts the pre-change feature amount from the query image 50c (S104), and detects the object attribute of the person included in the query image 50c (S105). In the example of FIG. 4, the query image 50c has an object attribute of "backward".
 画像処理フローの一巡目では、クエリ画像50cの物体属性は変化していないため(S106でNo)、制御部1は、時系列の画像データ50においてクエリ画像50cが示す人を追跡し、追跡画像50dを検出する(S107)。次に、制御部1は、追跡画像50dに含まれる人の物体属性を検出する(S105)。図4に示した例では、追跡画像50dに含まれる人は、「後ろ向き」の物体属性を有する。したがって、更にステップS106でNoに進む。このようにして、制御部1は、追跡画像に含まれる人の物体属性が変化するまでループを繰り返し、追跡画像50e,50fを更に検出する。追跡画像50eに含まれる人も「後ろ向き」の物体属性を有する一方で、追跡画像50fに含まれる人は、「前向き」の物体属性を有する。制御部1は、追跡画像50fに含まれる人の物体属性を検出した際に、物体属性が「後ろ向き」から「前向き」に変化したと判定する(S106でYes)。 In the first round of the image processing flow, since the object attribute of the query image 50c has not changed (No in S106), the control unit 1 tracks the person indicated by the query image 50c in the time- series image data 50, 50d is detected (S107). Next, the control unit 1 detects the object attribute of the person included in the tracking image 50d (S105). In the example shown in FIG. 4, the person included in the tracking image 50d has the object attribute of "backward facing". Therefore, the process proceeds to No in step S106. In this manner, the control unit 1 repeats the loop until the object attribute of the person included in the tracking images changes, and further detects the tracking images 50e and 50f. The person included in the tracking image 50e also has a "backward facing" object attribute, while the person included in the tracking image 50f has a "forward facing" object attribute. When the object attribute of the person included in the tracking image 50f is detected, the control unit 1 determines that the object attribute has changed from "backward facing" to "forward facing" (Yes in S106).
 したがって、制御部1は、追跡画像50fの特徴量を、変化後特徴量として抽出し(S108)、変化前特徴量と変化後特徴量とを結合する(S109)。次に、制御部1は、結合された特徴量と、画像リスト22の照合対象画像群のそれぞれの特徴量とを比較することにより、クエリ画像と画像リスト22の各画像とを照合する(S110)。結合された特徴量ベクトルと画像リスト22の各画像の特徴量ベクトルの照合方法としては、例えば結合された特徴量ベクトルと比較対象画像から得た特徴量ベクトルの内積を算出し、ある閾値以上の場合は2つの画像内の対象物は同一のもの(例えば同一人物)、ある閾値より低い場合は異なるもの(例えば別人)とする方法がある。このようにして、画像処理装置100は、同一人物を示す複数のクエリ画像の特徴を利用して精度良く照合を行うことができる。 Therefore, the control unit 1 extracts the feature amount of the tracking image 50f as the post-change feature amount (S108), and combines the pre-change feature amount and the post-change feature amount (S109). Next, the control unit 1 compares the query image with each image in the image list 22 by comparing the combined feature amount with each feature amount in the matching target image group in the image list 22 (S110 ). As a method of collating the combined feature amount vector and the feature amount vector of each image in the image list 22, for example, the inner product of the combined feature amount vector and the feature amount vector obtained from the image to be compared is calculated, In some cases, the objects in the two images are the same (for example, the same person), and if they are lower than a certain threshold, they are different (for example, different persons). In this manner, the image processing apparatus 100 can accurately perform matching using features of a plurality of query images showing the same person.
2-3.効果等
 以上のように、画像処理装置100は、画像取得部3と、対象物検出部の一例である人検出部11と、クエリ決定部12と、クエリ追跡部13と、属性決定部14と、特徴抽出部15と、判定部16と、特徴結合部17と、記憶装置2とを備える。画像取得部3は、時系列の画像データ群を含む画像データ50を入力する。人検出部11は、画像データ群において対象物の一例である人を検出する。クエリ決定部12は、人検出部11によって検出された複数の人画像のうちの1つを含む画像をクエリ画像に決定する。クエリ追跡部13は、クエリ画像が示す人を画像データ群において時系列で検索して追跡する。記憶装置2は、クエリ画像に含まれる人の照合対象である照合対象画像群を含む画像リスト22を記憶する。特徴抽出部15は、クエリ決定部12によって決定されたクエリ画像、クエリ追跡部13によって追跡された追跡画像、及び照合対象画像群からそれぞれ特徴量を抽出する。属性決定部14は、クエリ画像に含まれる人の物体属性及び追跡画像に含まれる人の物体属性を決定する。判定部16は、属性決定部14によって決定された物体属性に基づいて、クエリ画像及び追跡画像のうちの少なくとも2つの画像の特徴量を結合するか否かを判定する。特徴結合部17は、判定部16が、クエリ画像及び追跡画像のうちの少なくとも2つの画像の特徴量を結合すると判定した場合に、当該少なくとも2つの画像の特徴量を結合する。
2-3. Effects, etc. As described above, the image processing apparatus 100 includes the image acquisition unit 3, the person detection unit 11 which is an example of the object detection unit, the query determination unit 12, the query tracking unit 13, and the attribute determination unit 14. , a feature extraction unit 15 , a determination unit 16 , a feature combination unit 17 , and a storage device 2 . The image acquisition unit 3 receives image data 50 including time-series image data groups. The person detection unit 11 detects a person, which is an example of a target object, in the image data group. The query determining unit 12 determines an image including one of the plurality of human images detected by the human detecting unit 11 as a query image. The query tracking unit 13 searches and tracks the person indicated by the query image in chronological order in the image data group. The storage device 2 stores an image list 22 including a matching target image group which is a matching target of a person included in a query image. The feature extraction unit 15 extracts feature amounts from the query image determined by the query determination unit 12, the tracking image tracked by the query tracking unit 13, and the matching target image group. The attribute determination unit 14 determines the object attributes of the person included in the query image and the object attribute of the person included in the tracking image. Based on the object attribute determined by the attribute determination unit 14, the determination unit 16 determines whether or not to combine the feature amounts of at least two images of the query image and the tracking image. When the determining unit 16 determines to combine the feature amounts of at least two images of the query image and the tracking image, the feature combining unit 17 combines the feature amounts of the at least two images.
 この構成により、画像処理装置100は、同一人物を示す複数の画像(クエリ画像と追跡画像)の特徴を結合し、画像に映る人の特徴を従来技術より精度良く抽出することができる。 With this configuration, the image processing apparatus 100 can combine the features of a plurality of images (query image and tracking image) showing the same person, and extract the features of the person appearing in the image more accurately than in the conventional technology.
 判定部16は、当該少なくとも2つの画像が同一人物を示し、かつ、当該少なくとも2つの画像の物体属性が互いに異なる場合に、当該少なくとも2つの画像の特徴量を結合すると判定してもよい。 The determination unit 16 may determine to combine the feature amounts of the at least two images when the at least two images show the same person and the object attributes of the at least two images are different from each other.
 この構成により、画像処理装置100は、異なる物体属性を有する同一人物の複数の画像の特徴を結合し、画像に映る人の特徴をより精度良くかつ効率的に抽出することができる。 With this configuration, the image processing apparatus 100 can combine the features of a plurality of images of the same person with different object attributes, and extract the features of the person in the image more accurately and efficiently.
 画像処理装置100は、特徴結合部17によって当該少なくとも2つの画像の特徴量が結合された場合に、結合された特徴量と、当該少なくとも2つの画像以外の画像である画像リスト22内の画像の特徴量とを照合する照合部18を更に備えてもよい。 When the feature combining unit 17 combines the feature amounts of the at least two images, the image processing apparatus 100 combines the combined feature amounts with the images in the image list 22 that are images other than the at least two images. A matching unit 18 that matches the feature amount may be further provided.
 この構成により、画像処理装置100は、同一人物を示す複数の画像の特徴を利用して精度良く照合を行うことができる。 With this configuration, the image processing apparatus 100 can accurately perform matching using the features of a plurality of images showing the same person.
 具体的には、照合部18は、結合された特徴量と、当該少なくとも2つの画像以外の画像の特徴量との類似の程度を示す類似度が所定の閾値以上である場合に、当該少なくとも2つの画像に示された対象物と、当該少なくとも2つの画像以外の画像に示された対象物とが一致すると判断してもよい。 Specifically, if the degree of similarity between the combined feature quantity and the feature quantity of an image other than the at least two images is equal to or greater than a predetermined threshold, the collating unit 18 determines that the at least two It may be determined that an object shown in one image matches an object shown in an image other than the at least two images.
3.第2実施形態
3-1.構成
 図5は、本開示の第2実施形態に係る画像処理装置200の構成例を示すブロック図である。第1実施形態に係る画像処理装置100と比較すると、記憶装置2は、画像リスト22に代えて、画像リスト222を格納する。画像リスト222には、物体属性が互いに異なる複数の同一人物の画像が含まれる(図7参照)。画像リスト222の各画像は、例えば、各画像が示す人物のIDを含む。
3. Second Embodiment 3-1. Configuration FIG. 5 is a block diagram showing a configuration example of the image processing apparatus 200 according to the second embodiment of the present disclosure. Compared to the image processing apparatus 100 according to the first embodiment, the storage device 2 stores an image list 222 instead of the image list 22 . The image list 222 includes a plurality of images of the same person with different object attributes (see FIG. 7). Each image in image list 222 includes, for example, the ID of the person represented by each image.
3-2.動作
 図6は、第2実施形態に係る画像処理装置200の制御部1によって実行される画像処理の手順を例示するフローチャートである。
3-2. Operation FIG. 6 is a flowchart illustrating the procedure of image processing executed by the control unit 1 of the image processing apparatus 200 according to the second embodiment.
 制御部1は、画像取得部3を介して、画像データ250を取得する(S201)。例えば、画像データ250は、時系列の画像データ群を含む第1実施形態の画像データ50と異なり、1つの画像であってもよい。 The control unit 1 acquires image data 250 via the image acquisition unit 3 (S201). For example, the image data 250 may be one image, unlike the image data 50 of the first embodiment that includes time-series image data groups.
 人検出部11は、ステップS201で取得された画像データ250内で人を検出する(S202)。クエリ決定部12は、クエリ画像を決定する(S203)。例えば、クエリ決定部12は、ステップS202で検出された1以上の人画像のうちの1つをクエリ画像に決定する。 The human detection unit 11 detects a person in the image data 250 acquired in step S201 (S202). The query determination unit 12 determines a query image (S203). For example, the query determining unit 12 determines one of the one or more human images detected in step S202 as the query image.
 特徴抽出部15は、画像リスト222の各画像の特徴量を抽出する(S204)。属性決定部14は、画像リスト222の各画像に含まれる人の物体属性が予め定められた複数の物体属性のいずれに属するかを検出する(S205)。 The feature extraction unit 15 extracts the feature amount of each image in the image list 222 (S204). The attribute determining unit 14 detects to which of a plurality of predetermined object attributes the object attribute of a person included in each image of the image list 222 belongs (S205).
 判定部16は、画像リスト222に、同一人物を示す一方で互いに異なる物体属性に属する複数の画像があるか否かを判定する(S206)。判定部16は、同一人物を示す一方で互いに異なる物体属性に属する複数の画像があると判定した場合(S206でYes)、ステップS207に進み、複数の画像がないと判定した場合(S206でNo)、ステップS208に進む。例えば、画像リスト222の複数の画像が、同一人物を示す画像であり、当該複数の画像のうちの一部が前向きの当該人物を示し、他の画像が後ろ向きの当該人物を示している場合、判定部16は、同一人物を示す一方で互いに異なる物体属性に属する複数の画像があると判定する。 The determination unit 16 determines whether or not there are a plurality of images showing the same person but belonging to different object attributes in the image list 222 (S206). If the determination unit 16 determines that there are a plurality of images showing the same person but belonging to different object attributes (Yes in S206), the determination unit 16 proceeds to step S207. ) and proceed to step S208. For example, if a plurality of images in the image list 222 are images showing the same person, some of the images show the person facing forward, and other images show the person facing backward, The determination unit 16 determines that there are a plurality of images showing the same person but belonging to different object attributes.
 ステップS206でYesに進んだ場合には、特徴結合部17は、同一人物を示す一方で互いに異なる物体属性に属する複数の画像のそれぞれの特徴量を結合する(S207)。複数の画像のそれぞれの特徴量は、ステップS204において既に抽出されている。 If the process proceeds to Yes in step S206, the feature combining unit 17 combines feature amounts of a plurality of images showing the same person but belonging to different object attributes (S207). The feature amount of each of the multiple images has already been extracted in step S204.
 照合部18は、ステップS207で特徴量が結合された場合には、ステップS207で結合された特徴量と、クエリ画像の特徴量とを比較することにより、画像リスト222の各画像とクエリ画像とを照合する(S208)。ステップS206でNoに進み、ステップS207で特徴量が結合されていない場合には、照合部18は、画像リスト222の各画像の特徴量と、クエリ画像の特徴量とを比較することにより、画像リスト222の各画像とクエリ画像とを照合する(S208)。 When the feature amounts are combined in step S207, the matching unit 18 compares the feature amounts combined in step S207 with the feature amount of the query image, thereby matching each image in the image list 222 with the query image. is collated (S208). Proceeding to No in step S206, if the feature amount is not combined in step S207, the collation unit 18 compares the feature amount of each image in the image list 222 with the feature amount of the query image to obtain an image. Each image in the list 222 is compared with the query image (S208).
 図7は、第2実施形態に係る画像処理装置200の画像処理を説明するための模式図である。図7は、カメラ6によって撮像された画像データ250を示す。カメラ6は、例えば、建物、工場等の構内に設置される。 FIG. 7 is a schematic diagram for explaining image processing of the image processing apparatus 200 according to the second embodiment. FIG. 7 shows image data 250 captured by camera 6 . The camera 6 is installed, for example, in the premises of a building, factory, or the like.
 図7において、画像リスト222には、物体属性が互いに異なる複数の同一人物の画像が含まれる。図7には、画像リスト222に、人物Xの前向きの画像、人物Xの後ろ向きの画像、人物Yの前向きの画像、人物Yの後ろ向きの画像、人物Zの前向きの画像、及び人物Zの後ろ向きの画像が含まれる例を示している。画像リスト222は、例えば、構内で業務を行い得る従業員の画像を含む従業員画像データベースである。 In FIG. 7, the image list 222 includes a plurality of images of the same person with different object attributes. 7, the image list 222 includes a front-facing image of person X, a back-facing image of person X, a front-facing image of person Y, a back-facing image of person Y, a front-facing image of person Z, and a back-facing image of person Z. shows an example that includes an image of Image list 222 is, for example, an employee image database containing images of employees who may be working on the premises.
 制御部1は、画像データ250内で人を検出し(S202)、検出した人画像をクエリ画像とする(S203)。制御部1は、画像リスト222の各画像の特徴量を抽出し(S204)、各画像の物体属性を決定する(S205)。図7に示した例では、制御部1は、画像リスト222に、同一人物を示す一方で互いに異なる物体属性に属する複数の画像があると判定し(S206でYes)、人物Xの前向きの画像の特徴量と、人物Xの後ろ向きの画像の特徴量とを結合する。人物Y,Zの画像の特徴量についても同様に結合する(S207)。次に、制御部1は、結合された特徴量と、クエリ画像の特徴量とを比較することにより、画像リスト222の各画像とクエリ画像とを照合する。 The control unit 1 detects a person in the image data 250 (S202), and uses the detected person image as a query image (S203). The control unit 1 extracts the feature amount of each image in the image list 222 (S204), and determines the object attribute of each image (S205). In the example shown in FIG. 7, the control unit 1 determines that the image list 222 includes a plurality of images showing the same person but belonging to different object attributes (Yes in S206). and the feature amount of the image of the person X facing backward are combined. The feature amounts of the images of persons Y and Z are similarly combined (S207). Next, the control unit 1 compares each image in the image list 222 with the query image by comparing the combined feature amount and the feature amount of the query image.
 このようにして、画像処理装置200は、画像リスト222に含まれる、同一人物を示す複数の画像の特徴を利用して精度良く照合を行うことができる。また、画像処理装置200によれば、画像リスト222の複数の画像の特徴量が結合によってまとめられるため、カメラ6によって撮像された画像を、画像リスト222の全画像と比較することを避けることができ、比較回数を減らして計算処理量を低減することができる。これは、処理速度の向上にも繋がる。 In this way, the image processing apparatus 200 can perform matching with high accuracy using the features of a plurality of images showing the same person included in the image list 222 . Further, according to the image processing apparatus 200, since the feature amounts of the plurality of images in the image list 222 are combined, it is possible to avoid comparing the image captured by the camera 6 with all the images in the image list 222. It is possible to reduce the number of comparisons and reduce the amount of computational processing. This also leads to an improvement in processing speed.
3-3.効果等
 以上のように、画像処理装置200は、画像取得部3と、対象物検出部の一例である人検出部11と、クエリ決定部12と、属性決定部14と、特徴抽出部15と、判定部16と、特徴結合部17と、記憶装置2とを備える。画像取得部3は、画像データ250を入力する。人検出部11は、画像データ250において対象物の一例である人を検出する。クエリ決定部12は、人検出部11によって検出された人画像のうちの1つを含む画像をクエリ画像に決定する。記憶装置2は、クエリ画像に含まれる人の照合対象である照合対象画像群を含む画像リスト222を記憶する。特徴抽出部15は、照合対象画像群のそれぞれの特徴量を抽出する。属性決定部14は、照合対象画像群のそれぞれに含まれる対象物の物体属性を決定する。判定部16は、属性決定部14によって決定された物体属性に基づいて、照合対象画像群のうちの少なくとも2つの画像の特徴量を結合するか否かを判定する。特徴結合部17は、判定部16が、照合対象画像群のうちの少なくとも2つの画像の特徴量を結合すると判定した場合に、当該少なくとも2つの画像の特徴量を結合する。
3-3. Effects, etc. As described above, the image processing apparatus 200 includes the image acquisition unit 3, the person detection unit 11 which is an example of the object detection unit, the query determination unit 12, the attribute determination unit 14, and the feature extraction unit 15. , a determination unit 16 , a feature combining unit 17 , and a storage device 2 . Image acquisition unit 3 receives image data 250 . The person detection unit 11 detects a person, which is an example of a target object, in the image data 250 . The query determining unit 12 determines an image including one of the human images detected by the human detecting unit 11 as a query image. The storage device 2 stores an image list 222 that includes matching target images that are matching targets for people included in the query image. The feature extraction unit 15 extracts the feature amount of each of the matching target image groups. The attribute determination unit 14 determines object attributes of objects included in each of the matching target image groups. Based on the object attribute determined by the attribute determination unit 14, the determination unit 16 determines whether or not to combine the feature amounts of at least two images in the matching target image group. The feature combining unit 17 combines the feature amounts of at least two images when the determining unit 16 determines to combine the feature amounts of at least two images in the matching target image group.
 この構成により、画像処理装置200は、同一人物を示す複数のクエリ画像の特徴を結合し、画像に映る人の特徴を従来技術より精度良く抽出することができる。 With this configuration, the image processing apparatus 200 can combine the features of multiple query images showing the same person and extract the features of the person appearing in the image more accurately than in the conventional technology.
 画像処理装置200は、特徴結合部17によって当該少なくとも2つの画像の特徴量が結合された場合に、結合された特徴量と、当該少なくとも2つの画像以外の画像であるクエリ画像の特徴量とを照合する照合部18を更に備えてもよい。 When the feature combining unit 17 combines the feature amounts of the at least two images, the image processing apparatus 200 combines the combined feature amount with the feature amount of a query image that is an image other than the at least two images. A collation unit 18 for collation may be further provided.
 この構成により、画像処理装置200は、同一人物を示す複数の照合対象画像群の特徴を利用して精度良く照合を行うことができる。 With this configuration, the image processing apparatus 200 can accurately perform matching using the features of a plurality of matching target image groups showing the same person.
(他の実施形態)
 以上のように、本出願において開示する技術の例示として、実施形態を説明した。しかしながら、本開示における技術は、これに限定されず、適宜、変更、置き換え、付加、省略などを行った実施の形態にも適用可能である。以下、他の実施の形態を例示する。
(Other embodiments)
As described above, the embodiments have been described as examples of the technology disclosed in the present application. However, the technology in the present disclosure is not limited to this, and can be applied to embodiments in which modifications, replacements, additions, omissions, etc. are made as appropriate. Other embodiments are exemplified below.
 第1実施形態及び第2実施形態では、外部から入力された画像データにおいて検出されたクエリ画像と、記憶装置2内の画像リストに含まれる照合対象画像とを照合する構成について説明した。この構成において、複数のクエリ画像の特徴及び複数の照合対象画像の特徴の2者のうち、少なくとも一方を結合する例について説明した。しかしながら、本開示の実施形態はこれに限定されず、少なくとも2つの画像の特徴量を結合する構成であってもよい。 In the first and second embodiments, the configuration for matching the query image detected in the image data input from the outside with the matching target image included in the image list in the storage device 2 has been described. In this configuration, an example of combining at least one of the features of multiple query images and the features of multiple matching target images has been described. However, the embodiments of the present disclosure are not limited to this, and may be configured to combine feature amounts of at least two images.
 例えば、本開示の他の実施形態に係る画像処理装置では、特徴抽出部15は、それぞれが対象物を示す複数の画像から特徴量を抽出する。属性決定部14は、複数の画像のうちの少なくとも2つの画像の物体属性が、予め定められた複数の物体属性のいずれに属するかを決定する。判定部16は、属性決定部14によって決定された物体属性に基づいて、当該少なくとも2つの画像の特徴量を結合するか否かを判定する。例えば、判定部16は、当該少なくとも2つの画像の物体属性に含まれる属性情報を比較し、属性情報の差が閾値以上である場合に、特徴量を結合すると判定する。特徴結合部17は、判定部16が当該少なくとも2つの画像の特徴量を結合すると判定した場合に、当該少なくとも2つの画像の特徴量を結合する。 For example, in an image processing apparatus according to another embodiment of the present disclosure, the feature extraction unit 15 extracts feature amounts from multiple images each representing a target object. The attribute determination unit 14 determines to which of a plurality of predetermined object attributes the object attributes of at least two images out of the plurality of images belong. Based on the object attributes determined by the attribute determination unit 14, the determination unit 16 determines whether or not to combine the feature amounts of the at least two images. For example, the determination unit 16 compares the attribute information included in the object attributes of the at least two images, and determines to combine the feature amounts when the difference in the attribute information is equal to or greater than a threshold. The feature combining unit 17 combines the feature amounts of the at least two images when the determining unit 16 determines to combine the feature amounts of the at least two images.
 以上のように、本開示における技術の例示として、実施の形態を説明した。そのために、添付図面および詳細な説明を提供した。 As described above, the embodiment has been described as an example of the technology of the present disclosure. To that end, the accompanying drawings and detailed description have been provided.
 したがって、添付図面および詳細な説明に記載された構成要素の中には、課題解決のために必須な構成要素だけでなく、上記技術を例示するために、課題解決のためには必須でない構成要素も含まれ得る。そのため、それらの必須ではない構成要素が添付図面や詳細な説明に記載されていることをもって、直ちに、それらの必須ではない構成要素が必須であるとの認定をするべきではない。 Therefore, among the components described in the attached drawings and detailed description, there are not only components essential for solving the problem, but also components not essential for solving the problem in order to illustrate the above technology. can also be included. Therefore, it should not be immediately recognized that those non-essential components are essential just because they are described in the attached drawings and detailed description.
 また、上述の実施の形態は、本開示における技術を例示するためのものであるから、特許請求の範囲またはその均等の範囲において、種々の変更、置換、付加、省略などを行うことができる。 In addition, since the above-described embodiment is for illustrating the technology in the present disclosure, various changes, substitutions, additions, omissions, etc. can be made within the scope of claims or equivalents thereof.
 本開示は、画像処理技術、例えば画像検索技術及び画像照合技術に適用可能である。 The present disclosure is applicable to image processing technology, such as image search technology and image matching technology.
 1 制御部
 2 記憶装置
 3 画像取得部
 4 出力インタフェース
 5 入力インタフェース
 6 カメラ
 11 人検出部
 12 クエリ決定部
 13 クエリ追跡部
 14 属性決定部
 15 特徴抽出部
 16 判定部
 17 特徴結合部
 18 照合部
 21 特徴抽出モデル
 22,222 画像リスト
 50,250 画像データ
 100,200 画像処理装置
1 control unit 2 storage device 3 image acquisition unit 4 output interface 5 input interface 6 camera 11 person detection unit 12 query determination unit 13 query tracking unit 14 attribute determination unit 15 feature extraction unit 16 determination unit 17 feature combining unit 18 matching unit 21 feature Extraction model 22,222 Image list 50,250 Image data 100,200 Image processing device

Claims (10)

  1.  それぞれに同一の対象物が映っている複数の画像から前記対象物の特徴を示す特徴量を抽出する特徴抽出部と、
     前記複数の画像のうちの少なくとも2つの画像に含まれる前記対象物の物体属性を決定する属性決定部と、
     前記属性決定部によって決定された前記物体属性に基づいて、前記少なくとも2つの画像の前記物体属性が互いに異なる場合に、前記少なくとも2つの画像の前記特徴量を結合すると判定する判定部と、
     を備える画像処理装置。
    A feature extracting unit that extracts a feature amount indicating the feature of the object from a plurality of images each showing the same object;
    an attribute determination unit that determines an object attribute of the object included in at least two images among the plurality of images;
    a determination unit that determines, based on the object attributes determined by the attribute determination unit, to combine the feature amounts of the at least two images when the object attributes of the at least two images are different from each other;
    An image processing device comprising:
  2.  前記判定部が前記少なくとも2つの画像の前記特徴量を結合すると判定した場合に、前記少なくとも2つの画像の前記特徴量を結合する特徴結合部を更に備える、請求項1に記載の画像処理装置。 The image processing apparatus according to claim 1, further comprising a feature combining unit that combines the feature amounts of the at least two images when the determining unit determines to combine the feature amounts of the at least two images.
  3.  前記特徴結合部によって前記少なくとも2つの画像の前記特徴量が結合された場合に、結合された特徴量と、前記少なくとも2つの画像以外の画像の特徴量とを照合する照合部を更に備える、請求項2に記載の画像処理装置。 further comprising a matching unit that, when the feature combining unit combines the feature quantities of the at least two images, compares the combined feature quantity with a feature quantity of an image other than the at least two images. Item 3. The image processing apparatus according to item 2.
  4.  前記照合部は、前記結合された特徴量と、前記少なくとも2つの画像以外の画像の特徴量との類似の程度を示す類似度が所定の閾値以上である場合に、前記少なくとも2つの画像に示された対象物と、前記少なくとも2つの画像以外の画像に示された対象物とが一致すると判断する、請求項3に記載の画像処理装置。 The collation unit is configured to display a similarity between the combined feature quantity and the feature quantity of an image other than the at least two images, if the degree of similarity is equal to or greater than a predetermined threshold. 4. The image processing apparatus according to claim 3, wherein it is determined that the object shown in the image other than the at least two images matches the object shown in the image.
  5.  時系列の画像データ群を入力する画像取得部と、
     前記画像データ群において少なくとも1つの前記対象物を検出する対象物検出部と、
     前記対象物検出部によって検出された前記少なくとも1つの対象物のうちの1つを含む画像をクエリ画像に決定するクエリ決定部と、
     前記クエリ画像が示す対象物を前記画像データ群において時系列で検索して追跡するクエリ追跡部と、
     前記クエリ画像に含まれる前記対象物の照合対象である照合対象画像群を含む画像リストを記憶する記憶装置と、
     を更に備え、
     前記特徴抽出部は、前記クエリ決定部によって決定されたクエリ画像、前記クエリ追跡部によって追跡された追跡画像、及び前記照合対象画像群からそれぞれ特徴量を抽出し、
     前記属性決定部は、前記クエリ画像に含まれる対象物の物体属性及び前記追跡画像に含まれる対象物の物体属性を、前記少なくとも2つの画像に含まれる対象物の物体属性として決定する、
     請求項1~4のいずれかに記載の画像処理装置。
    an image acquisition unit for inputting a time-series image data group;
    an object detection unit that detects at least one of the objects in the image data group;
    a query determination unit that determines an image including one of the at least one object detected by the object detection unit as a query image;
    a query tracking unit that searches and tracks an object indicated by the query image in chronological order in the image data group;
    a storage device for storing an image list including a matching target image group that is a matching target of the object included in the query image;
    further comprising
    The feature extraction unit extracts feature amounts from the query image determined by the query determination unit, the tracked image tracked by the query tracking unit, and the matching target image group,
    The attribute determining unit determines an object attribute of the target included in the query image and an object attribute of the target included in the tracking image as object attributes of the target included in the at least two images.
    The image processing device according to any one of claims 1 to 4.
  6.  画像データを入力する画像取得部と、
     前記画像データにおいて少なくとも1つの前記対象物を検出する対象物検出部と、
     前記対象物検出部によって検出された前記少なくとも1つの対象物のうちの1つを含む画像をクエリ画像に決定するクエリ決定部と、
     前記クエリ画像に含まれる前記対象物の照合対象である照合対象画像群を含む画像リストを記憶する記憶装置と、
     を更に備え、
     前記特徴抽出部は、前記照合対象画像群のそれぞれの特徴量を抽出し、
     前記属性決定部は、前記照合対象画像群のそれぞれに含まれる対象物の物体属性を、前記少なくとも2つの画像に含まれる対象物の物体属性として決定する、
     請求項1~4のいずれかに記載の画像処理装置。
    an image acquisition unit for inputting image data;
    an object detection unit that detects at least one object in the image data;
    a query determination unit that determines an image including one of the at least one object detected by the object detection unit as a query image;
    a storage device for storing an image list including a matching target image group that is a matching target of the object included in the query image;
    further comprising
    The feature extraction unit extracts a feature amount of each of the matching target image groups,
    wherein the attribute determination unit determines an object attribute of an object included in each of the matching target image groups as an object attribute of the object included in the at least two images;
    The image processing device according to any one of claims 1 to 4.
  7.  前記対象物は、人であり、
     前記物体属性は、人の複数の向きを含む、
     請求項1~6のいずれかに記載の画像処理装置。
    the object is a person,
    wherein the object attributes include multiple orientations of a person;
    The image processing device according to any one of claims 1 to 6.
  8.  それぞれに同一の対象物が映っている複数の画像から前記対象物の特徴を示す特徴量を抽出する特徴量抽出ステップと、
     前記複数の画像のうちの少なくとも2つの画像に含まれる前記対象物の物体属性を決定する属性決定ステップと、
     前記属性決定ステップにおいて決定された前記物体属性に基づいて、前記少なくとも2つの画像の前記物体属性が互いに異なる場合に、前記少なくとも2つの画像の前記特徴量を結合すると判定する判定ステップと、
     を含む画像処理方法。
    A feature quantity extraction step of extracting a feature quantity indicating a feature of the object from a plurality of images each showing the same object;
    an attribute determination step of determining an object attribute of the object included in at least two images of the plurality of images;
    a determination step of determining to combine the feature amounts of the at least two images when the object attributes of the at least two images are different from each other, based on the object attributes determined in the attribute determination step;
    An image processing method including
  9.  前記判定ステップにおいて前記少なくとも2つの画像の前記特徴量を結合すると判定された場合に、前記少なくとも2つの画像の前記特徴量を結合する特徴結合ステップを更に含む、請求項8に記載の画像処理方法。 9. The image processing method according to claim 8, further comprising a feature combining step of combining said feature amounts of said at least two images when said determining step determines to combine said feature amounts of said at least two images. .
  10.  請求項8又は9に記載の画像処理方法を制御部に実行させるためのプログラム。 A program for causing a control unit to execute the image processing method according to claim 8 or 9.
PCT/JP2022/024400 2021-07-09 2022-06-17 Image processing device, image processing method, and program WO2023282033A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-114127 2021-07-09
JP2021114127 2021-07-09

Publications (1)

Publication Number Publication Date
WO2023282033A1 true WO2023282033A1 (en) 2023-01-12

Family

ID=84800240

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/024400 WO2023282033A1 (en) 2021-07-09 2022-06-17 Image processing device, image processing method, and program

Country Status (1)

Country Link
WO (1) WO2023282033A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018120644A (en) * 2018-05-10 2018-08-02 シャープ株式会社 Identification apparatus, identification method, and program
JP2020095757A (en) * 2020-03-23 2020-06-18 キヤノン株式会社 Information processing device, information processing method, and program
JP2021101384A (en) * 2017-12-18 2021-07-08 株式会社東芝 Image processing apparatus, image processing method and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021101384A (en) * 2017-12-18 2021-07-08 株式会社東芝 Image processing apparatus, image processing method and program
JP2018120644A (en) * 2018-05-10 2018-08-02 シャープ株式会社 Identification apparatus, identification method, and program
JP2020095757A (en) * 2020-03-23 2020-06-18 キヤノン株式会社 Information processing device, information processing method, and program

Similar Documents

Publication Publication Date Title
Jegham et al. Vision-based human action recognition: An overview and real world challenges
An et al. Performance evaluation of model-based gait on multi-view very large population database with pose sequences
Gong et al. Structured time series analysis for human action segmentation and recognition
Dantone et al. Human pose estimation using body parts dependent joint regressors
Pu et al. Facial expression recognition from image sequences using twofold random forest classifier
JP5682563B2 (en) Moving object locus identification system, moving object locus identification method, and moving object locus identification program
Pazhoumand-Dar et al. Joint movement similarities for robust 3D action recognition using skeletal data
Kusakunniran et al. Gait recognition across various walking speeds using higher order shape configuration based on a differential composition model
Chattopadhyay et al. Frontal gait recognition from incomplete sequences using RGB-D camera
JP2021101384A (en) Image processing apparatus, image processing method and program
Križaj et al. Adaptation of SIFT features for face recognition under varying illumination
Haber et al. A practical approach to real-time neutral feature subtraction for facial expression recognition
Shyam et al. A taxonomy of 2D and 3D face recognition methods
Zhu et al. Human action recognition using multi-layer codebooks of key poses and atomic motions
Hsu et al. Fast landmark localization with 3D component reconstruction and CNN for cross-pose recognition
Xia et al. Face occlusion detection using deep convolutional neural networks
Wang et al. A new hand gesture recognition algorithm based on joint color-depth superpixel earth mover's distance
Tathe et al. Human face detection and recognition in videos
Abedi et al. Modification of deep learning technique for face expressions and body postures recognitions
WO2023282033A1 (en) Image processing device, image processing method, and program
JP7434914B2 (en) Pedestrian object detection device and method, electronic equipment
Zhang et al. A review of human action recognition in video
Elaoud et al. Analysis of skeletal shape trajectories for person re-identification
dos Santos Jangua et al. Human Identification Based on Gait and Soft Biometrics
Khokhlova et al. 3D visual-based human motion descriptors: a review

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22837451

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE