WO2023282033A1

WO2023282033A1 - Image processing device, image processing method, and program

Info

Publication number: WO2023282033A1
Application number: PCT/JP2022/024400
Authority: WO
Inventors: 拓実小島; 俊介安木; 祐介加藤
Original assignee: パナソニックＩｐマネジメント株式会社
Priority date: 2021-07-09
Filing date: 2022-06-17
Publication date: 2023-01-12

Abstract

This image processing device comprises a feature extraction unit, an attribute selection unit, and a determination unit. The feature extraction unit extracts a feature value of a target object from a plurality of images in which the same target object is reflected. The attribute selection unit selects an object attribute of the target object included in at least two of the plurality of images. On the basis of the object attribute selected by the attribute selection unit, the determination unit determines to combine the feature values of the at least two images when there is a difference in the object attribute among the at least two images.

Description

Image processing device, image processing method, and program

The present disclosure relates to an image processing device, an image processing method, and a program.

An image matching technique for matching a query image and an image to be matched is known. For example, Patent Literature 1 discloses an image search technique that generates a search query from posture information specified by a user, and searches an image database for images including similar postures according to the search query.

Japanese Patent No. 6831769

An object of the present disclosure is to provide an image processing device, an image processing method, and a program capable of extracting features of an object appearing in an image with higher precision than the conventional technology.

One aspect of the present disclosure is a feature extraction unit that extracts a feature amount indicating a feature of an object from a plurality of images in which the same object is captured, and at least two of the plurality of images include: an attribute determination unit that determines an object attribute of an object; and combining the feature amounts of at least two images when the object attributes of the at least two images are different from each other based on the object attributes determined by the attribute determination unit. and a determining unit for determining.

Another aspect of the present disclosure is a feature amount extraction step of extracting a feature amount indicating a feature of the object from a plurality of images in which the object is shown in each, and at least two of the plurality of images include an attribute determining step of determining an object attribute of an object, and combining the feature amounts of at least two images when the object attributes of the at least two images are different from each other based on the object attribute determined in the attribute determining step and a determining step of determining.

A further aspect of the present disclosure provides a program for causing a control unit to execute the above image processing method.

According to the image processing device, the image processing method, and the program according to the present disclosure, it is possible to extract the features of an object appearing in an image with higher accuracy than in the prior art.

Schematic diagram showing an outline of an image processing device according to an embodiment of the present disclosure 1 is a block diagram showing a configuration example of an image processing apparatus according to a first embodiment; FIG. 4 is a flowchart of image processing executed by the image processing apparatus according to the first embodiment; Schematic diagram for explaining image processing of the image processing apparatus according to the first embodiment. Block diagram showing a configuration example of an image processing apparatus according to a second embodiment Flowchart of image processing executed by the image processing apparatus according to the second embodiment Schematic diagram for explaining image processing of the image processing apparatus according to the second embodiment.

(Circumstances leading to this disclosure)
An image matching technique for matching a query image and a matching target image is known. Such an image matching technique is used, for example, to search for a person to be searched from among a plurality of captured images generated by a plurality of surveillance cameras installed in towns, premises, and the like. For example, Patent Literature 1 discloses an image search technique that generates a search query from posture information specified by a user, and searches an image database for images including similar postures according to the search query.

However, in the conventional image matching technique, even if both the query image and the image to be matched are images showing the same person, if the attributes such as orientation, posture, and clothing of the person in both images are different from each other, both the The similarity of the features of the images is calculated to be low, and it may not be possible to recognize that the person is the same person. For example, if the query image is a front-facing image in which the face of a certain person can be seen, and the matching target image is an image showing the same person facing backward, the identity of the same person may not be recognized.

The present inventors have conducted research to solve the above problems and have developed an image matching device, an image matching method, and a program that extract the features of an object shown in an image more accurately than the conventional technology.

Hereinafter, embodiments will be described in detail with reference to the drawings as appropriate. However, more detailed description than necessary may be omitted. For example, detailed descriptions of well-known matters and redundant descriptions of substantially the same configurations may be omitted. This is to avoid unnecessary verbosity in the following description and to facilitate understanding by those skilled in the art.

It is noted that Applicants provide the accompanying drawings and the following description for a full understanding of the present disclosure by those skilled in the art and are not intended to limit the claimed subject matter thereby. Absent.

1. Overview FIG. 1 is a schematic diagram showing an overview of an image processing device 100 according to an embodiment of the present disclosure. An example of the image processing device 100 is an image matching device that matches the query images 50a and 50b with the matching target image 22a. The query images 50a and 50b are, for example, human images detected from captured images generated by a surveillance camera, and the matching target image 22a is, for example, an image showing a person being searched.

In FIG. 1, the image processing device 100 acquires query images 50a and 50b, and detects the orientation and feature amount of a person appearing in each image. Here, the "feature amount" is an amount representing the features of an object such as a person appearing in an image. is represented by a vector quantity including the gradient of

When the query images 50a and 50b show the same person, but the direction of the person in both images is different, the image processing device 100 combines the feature amounts of both images in which the object is shown. When the feature amount is combined, the image processing apparatus 100 compares the combined feature amount with the feature amount of the matching target image 22a for matching. Here, "combination" of a plurality of feature amounts assumes vector operation. For example, "combining" a plurality of feature amounts includes averaging the plurality of feature amounts, calculating a direct sum of the plurality of feature amounts, and calculating a difference between the plurality of feature amounts. Also, "combination" of a plurality of feature quantities may include weighting processing for increasing and emphasizing the feature when there is a common feature among the plurality of feature quantities. The combined feature amount may have a different number of dimensions from the feature amount to be compared. In this case, the shape of the feature amount to be compared may be matched to the combined feature amount. For example, if the number of dimensions is changed by direct addition of two feature values to be combined, the feature value to be compared can be doubled to unify the number of dimensions of the combined feature value and the feature value to be compared. good. As a method of matching the combined feature quantity and the feature quantity to be compared, for example, the inner product of the combined feature quantity and the feature quantity obtained from the comparison target image is calculated. There is a method in which the objects are the same (eg, the same person), and if they are lower than a certain threshold, they are different (eg, different people).

The combined feature quantity contains multifaceted information that sees the person from a different direction than when it is not combined. Therefore, when the query images 50a and 50b and the matching target image 22a all indicate the same person, the feature values of the combined query images 50a and 50b and the matching target image 22a are combined. More matching elements than without it. On the other hand, when the query images 50a and 50b indicate the same person and the person indicated by the matching target image 22a is different, the combined feature amount of the query images 50a and 50b and the feature amount of the matching target image 22a are , the number of matching elements decreases compared to the case where the feature quantities are not combined. In this way, by combining the feature amounts of a plurality of images showing the same person in different orientations, the accuracy of matching between the query images 50a and 50b and the matching target image 22a is improved. In matching, the image processing apparatus 100 determines, for example, a degree of similarity indicating the degree of similarity between the two feature amounts, and determines that the same person is shown in both images when the degree of similarity is equal to or greater than a predetermined threshold. do.

2. First Embodiment 2-1. Configuration FIG. 2 is a block diagram showing a configuration example of the image processing apparatus 100 according to the first embodiment of the present disclosure. The image processing apparatus 100 includes a control section 1 , a storage device 2 , an image acquisition section 3 for acquiring image data 50 , an input interface (I/F) 5 and an output interface (I/F) 4 .

The control unit 1 implements the functions of the image processing apparatus 100 by executing information processing. Such information processing is realized by executing a program stored in the storage device 2 by the control unit 1, for example. The control unit 1 includes a person detection unit 11, a query determination unit 12, a query tracking unit 13, an attribute determination unit 14, a feature extraction unit 15, a determination unit 16, a feature combination unit 17, and a matching unit 18. including. The control unit 1 is composed of circuits such as a CPU, MPU, and FPGA.

An example of the function of each component of the control unit 1 will be described below. The human detector 11 detects a human within the image data 50 . A query determination unit 12 determines a query image. For example, the query determining unit 12 determines one of the human images detected by the human detecting unit 11 as the query image. The query tracking unit 13 tracks the person indicated by the query image determined by the query determination unit 12 in the time-series image data group of the image data 50 . The attribute determination unit 14 detects to which of a plurality of predetermined object attributes the attribute of the object included in the input image, for example, the query image belongs. The feature extraction unit 15 extracts feature amounts from an input image such as a query image. The feature extraction unit 15 may use a feature extraction model 21 that inputs an image and outputs a feature amount vector of the image in order to extract the feature amount. The determination unit 16 determines whether or not the object attribute of the person included in the query image determined by the query determination unit 12 has changed in the time-series image data group. The feature combining unit 17 combines feature amounts of a plurality of images. The matching unit 18 compares the feature amount of the plurality of images combined by the feature combining unit 17 with the feature amount of the other image, thereby identifying the person indicated by the plurality of images and the person included in the other image. match. A detailed example of the function of each component will be described later in relation to the operation of the image processing apparatus 100 .

The storage device 2 is a recording medium for recording various information including programs and data for causing the control unit 1 to execute image processing by the image processing device 100 . For example, the storage device 2 stores a later-described feature extraction model 21 that is a trained model, and an image list 22 that includes a matching target image group that is a matching target of the query image. The storage device 2 is realized by, for example, a semiconductor storage device such as a flash memory, a solid state drive (SSD), a magnetic storage device such as a hard disk drive (HDD), or other recording media alone or in combination. The storage device 2 may include volatile memory such as SRAM and DRAM. The storage device 2 may be any of an internal type, an external type, and a NAS (network-attached storage) type.

The image acquisition unit 3 is an interface circuit that connects the image processing device 100 and external devices in order to input information such as the image data 50 to the image processing device 100 . Such an external device is, for example, another information processing terminal (not shown) or a device such as a camera that acquires the image data 50 . The image acquisition unit 3 may be a communication circuit that performs data communication according to existing wired communication standards or wireless communication standards.

The input interface 5 is an interface circuit that connects the image processing device 100 and an input device 80 such as a keyboard and a mouse in order to receive user input. The input interface 5 may be a communication circuit that performs data communication according to existing wired communication standards or wireless communication standards.

The output interface 4 is an interface circuit that connects the image processing device 100 and an external output device in order to output information from the image processing device 100 . Such external output devices include, for example, information processing terminals such as smartphones and tablets, and displays. The output interface 4 may be a communication circuit that is connected to a network and performs data communication according to existing wired communication standards or wireless communication standards. The image acquisition unit 3, input interface 5 and output interface 4 may be realized by separate or common hardware.

2-2. Operation FIG. 3 is a flowchart of image processing executed by the image processing apparatus 100 .

The control unit 1 acquires the image data 50 via the image acquisition unit 3 (S101). The image data 50 is, for example, a group of time-series image data captured by a camera installed in the city, on the premises, or the like. The control unit 1 may sequentially acquire the image data 50 captured by the camera as frames in real time. Alternatively, the image data 50 may be recorded data recorded in advance.

The human detection unit 11 detects a person in the image data 50 acquired in step S101 (S102). Here, detecting a person in the image data 50 includes detecting an area where a person exists in the image data 50 and detecting a person image.

The query determination unit 12 determines a query image (S103). For example, the query determining unit 12 determines one of the plurality of human images detected in step S102 as the query image.

Alternatively, in step S103, the query determining unit 12 may select a person image selected by the user using the input device 80 such as a keyboard or mouse from among the plurality of person images detected in step S102 as the query image. . The query determination unit 12 may use an image of a person stored in advance in the storage device 2, an image of a person input via the image acquisition unit 3, or the like as a query image.

The feature extraction unit 15 extracts feature amounts (referred to as "pre-change feature amounts" in comparison with post-change feature amounts described later) from the query image (S104). The feature extraction unit 15 may use a feature extraction model 21 that inputs an image and outputs a feature amount vector of the image in order to extract the feature amount. Such a feature extraction model 21 is a trained model constructed by having the model learn the relationship between the learning image and the correct information. The feature extraction model 21, which is a trained model, may be a model having the structure of a neural network, for example, a convolutional neural network (CNN). When constructing the feature extraction model 21 by learning a model such as CNN, in the feature extraction model 21, the output of the convolutional layer or the pooling layer can be used as the feature amount. Therefore, the feature extraction model 21 may be one without the fully connected layer at the last stage of the model.

The attribute determination unit 14 detects to which of a plurality of predetermined object attributes the attribute of the object included in the query image belongs (S105). For example, the attribute determining unit 14 assigns one of a plurality of predetermined object attributes to the query image.

An example of a predetermined object attribute is the orientation of the person in the image. In this case, the attribute determination unit 14 determines the orientation of the person in the query image by comparing the feature amount vector of the query image output by the feature extraction model 21 with the feature amount vector of the person image having a predetermined orientation. To detect. The orientation of the person is, for example, the orientation of the face of the person in the query image, the orientation of the upper half of the body, the orientation of the lower half of the body, or an orientation determined by combining these pieces of information.

When determining the orientation of a person as an object attribute, the attribute determination unit 14 may use an orientation detection model that outputs the orientation of the person in the human image by inputting the image of the person. Such an orientation detection model is a trained model constructed by having the model learn the relationship between the learning image and the correct information. A known skeleton detector, posture detector, and face orientation detector may be applied to the attribute determination unit 14 .

The direction of the person detected in this way is, for example, 8 directions of forward, obliquely forward to the right, facing right, obliquely backward to the right, backward, obliquely backward to the left, facing left, and obliquely forward to the left when viewed from the person in the candidate image. direction can be classified.

Also, the object attribute determined by the attribute determination unit 14 is not limited to the orientation of the person, and may be attributes such as the person's height, body type, hairstyle, and the like. A person's height, body shape, and hairstyle can be easily estimated from the image itself, for example, using well-known image recognition techniques. Also, the object attribute may be an attribute representing whether or not a person is wearing clothes of a specific shape such as a suit. An object attribute may be an attribute representing whether a person is holding a bag, carrying a backpack, pulling a suitcase, making a phone call, and the like. Furthermore, the object attribute may be an attribute indicating whether a person is riding a vehicle such as a bicycle or motorcycle, walking, running, stationary, standing or sitting, and the like.

The attribute determination unit 14 may detect the pose of a person in the query image and estimate the object attribute as described above based on the detected pose. Alternatively, the attribute determination unit 14 may use an orientation attribute detection model that outputs an object attribute of a human image by inputting a human image. Such an attribute detection model is a trained model constructed by having the model learn the relationship between the learning image and the correct information.

The object attributes determined by the attribute determining unit 14 are not limited to human attributes, and may be attributes of non-human objects. For example, the object attribute may be the color, material, shape, etc. of the object.

The determination unit 16 determines whether or not the object attribute of the person included in the query image determined in step S103 has changed in the time-series image data group (S106). For example, the determination unit 16 compares the attribute information included in the object attributes of the person appearing in the query image and the tracking image described later, and if the difference in the attribute information is equal to or greater than a threshold, or if the classification information included in the attribute information is different For example, it is determined that the object attribute has changed. For example, if the person in the query image has a package, the query image has a value of 1 as the attribute information about the package. The information has a value of 0, and the difference between both attribute information is 1. Further, for example, another example of the difference in attribute information is the difference between the direction in which the person in the query image is facing and the direction in which the person in the tracking image is facing, expressed as an angle. When determining that the object attribute has changed (Yes in S106), the determining unit 16 proceeds to step S108. In this case, it means that the determination unit 16 has determined to combine the pre-change feature amount and the post-change feature amount, which will be described later. When determining that the object attribute has not changed (No in S106), the determination unit 16 proceeds to step S107.

If the determination unit 16 determines that the object attribute has not changed (No in S106), the query tracking unit 13 searches the time-series image data group of the image data 50 for the person indicated by the query image determined in step S103. (S107). For example, based on the position in the image of a person detected or tracked in a specific frame, which is one of the image data groups, the query tracking unit 13 detects the person in subsequent frames captured after the specific frame. Chase. Therefore, the person indicated by the query image determined in step S103 and the person indicated by the image tracked in step S107 (herein referred to as "tracking image") are the same person, for example, the same identification information (ID ) indicates a person with To track an object such as a person, for example, an object in a specific frame is stored as a template in the storage device 2, and using the template, for example, a technique such as known template matching is applied. This can be achieved by searching for The attribute determining unit 14 also determines the object attribute of the tracking image as needed. After completing the tracking process in step S107, the process returns to step S105.

When the determination unit 16 determines that the object attribute has changed (Yes in S106), the feature extraction unit 15 extracts the human feature amount (post-change feature amount) included in the tracking image after the attribute change (S108). ).

The feature combining unit 17 combines the pre-change feature amount extracted in step S104 and the post-change feature amount extracted in step S108 (S109).

When the feature amounts are combined in step S109, the matching unit 18 compares the feature amounts combined in step S109 with the feature amounts of each of the matching target image groups in the image list 22 to obtain the query image. The person indicated by is matched with the person included in each image in the image list 22 . Specifically, for example, if the degree of similarity between the feature amount combined in step S109 and the feature amount of each image in the image list 22 is equal to or greater than a predetermined threshold, the matching unit 18 It is determined that the person shown in the query image matches the person shown in each image in the image list 22 .

The degree of similarity as described above is calculated, for example, by a predetermined degree of similarity calculation algorithm. For example, the matching unit 18 calculates the degree of similarity based on comparison of feature amount vectors. The predetermined similarity calculation algorithm is such that the smaller the distance, such as the Euclidean distance or the Mahalanobis distance, or the inner product between the feature amount vector combined in step S109 and the feature amount vector of each image in the image list 22, the greater the similarity. This is an algorithm for calculating the degree of similarity so that The predetermined similarity calculation algorithm may be an algorithm that applies a model constructed by metric learning to calculate the distance between a plurality of feature amount vectors. The degree of similarity means, for example, that the larger the value, the higher the degree of matching between the compared two feature amount vectors. There is a possibility that the number of dimensions of the combined feature vector differs from that of the comparison target image. may For example, when two feature amount vectors to be combined are directly summed and the number of dimensions is changed compared to before combination, the feature amount vector of the image to be compared is doubled, and the combined feature amount vector and the image to be compared are doubled. The number of dimensions of feature amount vectors may be unified.

It has been described above that in step S106, if the object attribute of the person included in the query image or the tracking image does not change, the process returns to step S105 via step S107. However, when the tracking process is completed for all of the time-series image data group, the control unit 1 does not change the object attribute of the person included in the tracking image even after the process of FIG. 3 is completed. good. Alternatively, when the object attribute of the person included in the tracking image does not change even after the tracking processing for all the time-series image data groups is completed, the matching unit 18 extracts the feature amount ( By comparing the feature amount before change) with the feature amount of each image in the image list 22, the person indicated by the query image and the person included in each image in the image list 22 may be matched.

Note that after executing the subsequent steps S104 to S110 for the query image determined in step S103, the query determining unit 12 selects the other image among the plurality of human images detected in step S102 as the query image. It may be determined, and the subsequent processes of steps S103 to S110 may be executed. In this way, the control unit 1 may repeatedly execute the process of FIG. 3 so that the process of FIG. 3 is executed for all persons appearing in the image data 50 .

FIG. 4 is a schematic diagram for explaining image processing of the image processing apparatus 100. FIG. FIG. 4 shows time-series image data 50 captured by the camera 6 . The control unit 1 detects a person in the image data 50 (S102), and sets the detected person image as a query image 50c (S103). At the same time or after this, the control unit 1 extracts the pre-change feature amount from the query image 50c (S104), and detects the object attribute of the person included in the query image 50c (S105). In the example of FIG. 4, the query image 50c has an object attribute of "backward".

In the first round of the image processing flow, since the object attribute of the query image 50c has not changed (No in S106), the control unit 1 tracks the person indicated by the query image 50c in the time-

series image data

50, 50d is detected (S107). Next, the control unit 1 detects the object attribute of the person included in the tracking image 50d (S105). In the example shown in FIG. 4, the person included in the tracking image 50d has the object attribute of "backward facing". Therefore, the process proceeds to No in step S106. In this manner, the control unit 1 repeats the loop until the object attribute of the person included in the tracking images changes, and further detects the

tracking images

50e and 50f. The person included in the tracking image 50e also has a "backward facing" object attribute, while the person included in the tracking image 50f has a "forward facing" object attribute. When the object attribute of the person included in the tracking image 50f is detected, the control unit 1 determines that the object attribute has changed from "backward facing" to "forward facing" (Yes in S106).

Therefore, the control unit 1 extracts the feature amount of the tracking image 50f as the post-change feature amount (S108), and combines the pre-change feature amount and the post-change feature amount (S109). Next, the control unit 1 compares the query image with each image in the image list 22 by comparing the combined feature amount with each feature amount in the matching target image group in the image list 22 (S110 ). As a method of collating the combined feature amount vector and the feature amount vector of each image in the image list 22, for example, the inner product of the combined feature amount vector and the feature amount vector obtained from the image to be compared is calculated, In some cases, the objects in the two images are the same (for example, the same person), and if they are lower than a certain threshold, they are different (for example, different persons). In this manner, the image processing apparatus 100 can accurately perform matching using features of a plurality of query images showing the same person.

2-3. Effects, etc. As described above, the image processing apparatus 100 includes the image acquisition unit 3, the person detection unit 11 which is an example of the object detection unit, the query determination unit 12, the query tracking unit 13, and the attribute determination unit 14. , a feature extraction unit 15 , a determination unit 16 , a feature combination unit 17 , and a storage device 2 . The image acquisition unit 3 receives image data 50 including time-series image data groups. The person detection unit 11 detects a person, which is an example of a target object, in the image data group. The query determining unit 12 determines an image including one of the plurality of human images detected by the human detecting unit 11 as a query image. The query tracking unit 13 searches and tracks the person indicated by the query image in chronological order in the image data group. The storage device 2 stores an image list 22 including a matching target image group which is a matching target of a person included in a query image. The feature extraction unit 15 extracts feature amounts from the query image determined by the query determination unit 12, the tracking image tracked by the query tracking unit 13, and the matching target image group. The attribute determination unit 14 determines the object attributes of the person included in the query image and the object attribute of the person included in the tracking image. Based on the object attribute determined by the attribute determination unit 14, the determination unit 16 determines whether or not to combine the feature amounts of at least two images of the query image and the tracking image. When the determining unit 16 determines to combine the feature amounts of at least two images of the query image and the tracking image, the feature combining unit 17 combines the feature amounts of the at least two images.

With this configuration, the image processing apparatus 100 can combine the features of a plurality of images (query image and tracking image) showing the same person, and extract the features of the person appearing in the image more accurately than in the conventional technology.

The determination unit 16 may determine to combine the feature amounts of the at least two images when the at least two images show the same person and the object attributes of the at least two images are different from each other.

With this configuration, the image processing apparatus 100 can combine the features of a plurality of images of the same person with different object attributes, and extract the features of the person in the image more accurately and efficiently.

When the feature combining unit 17 combines the feature amounts of the at least two images, the image processing apparatus 100 combines the combined feature amounts with the images in the image list 22 that are images other than the at least two images. A matching unit 18 that matches the feature amount may be further provided.

With this configuration, the image processing apparatus 100 can accurately perform matching using the features of a plurality of images showing the same person.

Specifically, if the degree of similarity between the combined feature quantity and the feature quantity of an image other than the at least two images is equal to or greater than a predetermined threshold, the collating unit 18 determines that the at least two It may be determined that an object shown in one image matches an object shown in an image other than the at least two images.

3. Second Embodiment 3-1. Configuration FIG. 5 is a block diagram showing a configuration example of the image processing apparatus 200 according to the second embodiment of the present disclosure. Compared to the image processing apparatus 100 according to the first embodiment, the storage device 2 stores an image list 222 instead of the image list 22 . The image list 222 includes a plurality of images of the same person with different object attributes (see FIG. 7). Each image in image list 222 includes, for example, the ID of the person represented by each image.

3-2. Operation FIG. 6 is a flowchart illustrating the procedure of image processing executed by the control unit 1 of the image processing apparatus 200 according to the second embodiment.

The control unit 1 acquires image data 250 via the image acquisition unit 3 (S201). For example, the image data 250 may be one image, unlike the image data 50 of the first embodiment that includes time-series image data groups.

The human detection unit 11 detects a person in the image data 250 acquired in step S201 (S202). The query determination unit 12 determines a query image (S203). For example, the query determining unit 12 determines one of the one or more human images detected in step S202 as the query image.

The feature extraction unit 15 extracts the feature amount of each image in the image list 222 (S204). The attribute determining unit 14 detects to which of a plurality of predetermined object attributes the object attribute of a person included in each image of the image list 222 belongs (S205).

The determination unit 16 determines whether or not there are a plurality of images showing the same person but belonging to different object attributes in the image list 222 (S206). If the determination unit 16 determines that there are a plurality of images showing the same person but belonging to different object attributes (Yes in S206), the determination unit 16 proceeds to step S207. ) and proceed to step S208. For example, if a plurality of images in the image list 222 are images showing the same person, some of the images show the person facing forward, and other images show the person facing backward, The determination unit 16 determines that there are a plurality of images showing the same person but belonging to different object attributes.

If the process proceeds to Yes in step S206, the feature combining unit 17 combines feature amounts of a plurality of images showing the same person but belonging to different object attributes (S207). The feature amount of each of the multiple images has already been extracted in step S204.

When the feature amounts are combined in step S207, the matching unit 18 compares the feature amounts combined in step S207 with the feature amount of the query image, thereby matching each image in the image list 222 with the query image. is collated (S208). Proceeding to No in step S206, if the feature amount is not combined in step S207, the collation unit 18 compares the feature amount of each image in the image list 222 with the feature amount of the query image to obtain an image. Each image in the list 222 is compared with the query image (S208).

FIG. 7 is a schematic diagram for explaining image processing of the image processing apparatus 200 according to the second embodiment. FIG. 7 shows image data 250 captured by camera 6 . The camera 6 is installed, for example, in the premises of a building, factory, or the like.

In FIG. 7, the image list 222 includes a plurality of images of the same person with different object attributes. 7, the image list 222 includes a front-facing image of person X, a back-facing image of person X, a front-facing image of person Y, a back-facing image of person Y, a front-facing image of person Z, and a back-facing image of person Z. shows an example that includes an image of Image list 222 is, for example, an employee image database containing images of employees who may be working on the premises.

The control unit 1 detects a person in the image data 250 (S202), and uses the detected person image as a query image (S203). The control unit 1 extracts the feature amount of each image in the image list 222 (S204), and determines the object attribute of each image (S205). In the example shown in FIG. 7, the control unit 1 determines that the image list 222 includes a plurality of images showing the same person but belonging to different object attributes (Yes in S206). and the feature amount of the image of the person X facing backward are combined. The feature amounts of the images of persons Y and Z are similarly combined (S207). Next, the control unit 1 compares each image in the image list 222 with the query image by comparing the combined feature amount and the feature amount of the query image.

In this way, the image processing apparatus 200 can perform matching with high accuracy using the features of a plurality of images showing the same person included in the image list 222 . Further, according to the image processing apparatus 200, since the feature amounts of the plurality of images in the image list 222 are combined, it is possible to avoid comparing the image captured by the camera 6 with all the images in the image list 222. It is possible to reduce the number of comparisons and reduce the amount of computational processing. This also leads to an improvement in processing speed.

3-3. Effects, etc. As described above, the image processing apparatus 200 includes the image acquisition unit 3, the person detection unit 11 which is an example of the object detection unit, the query determination unit 12, the attribute determination unit 14, and the feature extraction unit 15. , a determination unit 16 , a feature combining unit 17 , and a storage device 2 . Image acquisition unit 3 receives image data 250 . The person detection unit 11 detects a person, which is an example of a target object, in the image data 250 . The query determining unit 12 determines an image including one of the human images detected by the human detecting unit 11 as a query image. The storage device 2 stores an image list 222 that includes matching target images that are matching targets for people included in the query image. The feature extraction unit 15 extracts the feature amount of each of the matching target image groups. The attribute determination unit 14 determines object attributes of objects included in each of the matching target image groups. Based on the object attribute determined by the attribute determination unit 14, the determination unit 16 determines whether or not to combine the feature amounts of at least two images in the matching target image group. The feature combining unit 17 combines the feature amounts of at least two images when the determining unit 16 determines to combine the feature amounts of at least two images in the matching target image group.

With this configuration, the image processing apparatus 200 can combine the features of multiple query images showing the same person and extract the features of the person appearing in the image more accurately than in the conventional technology.

When the feature combining unit 17 combines the feature amounts of the at least two images, the image processing apparatus 200 combines the combined feature amount with the feature amount of a query image that is an image other than the at least two images. A collation unit 18 for collation may be further provided.

With this configuration, the image processing apparatus 200 can accurately perform matching using the features of a plurality of matching target image groups showing the same person.

(Other embodiments)
As described above, the embodiments have been described as examples of the technology disclosed in the present application. However, the technology in the present disclosure is not limited to this, and can be applied to embodiments in which modifications, replacements, additions, omissions, etc. are made as appropriate. Other embodiments are exemplified below.

In the first and second embodiments, the configuration for matching the query image detected in the image data input from the outside with the matching target image included in the image list in the storage device 2 has been described. In this configuration, an example of combining at least one of the features of multiple query images and the features of multiple matching target images has been described. However, the embodiments of the present disclosure are not limited to this, and may be configured to combine feature amounts of at least two images.

For example, in an image processing apparatus according to another embodiment of the present disclosure, the feature extraction unit 15 extracts feature amounts from multiple images each representing a target object. The attribute determination unit 14 determines to which of a plurality of predetermined object attributes the object attributes of at least two images out of the plurality of images belong. Based on the object attributes determined by the attribute determination unit 14, the determination unit 16 determines whether or not to combine the feature amounts of the at least two images. For example, the determination unit 16 compares the attribute information included in the object attributes of the at least two images, and determines to combine the feature amounts when the difference in the attribute information is equal to or greater than a threshold. The feature combining unit 17 combines the feature amounts of the at least two images when the determining unit 16 determines to combine the feature amounts of the at least two images.

As described above, the embodiment has been described as an example of the technology of the present disclosure. To that end, the accompanying drawings and detailed description have been provided.

Therefore, among the components described in the attached drawings and detailed description, there are not only components essential for solving the problem, but also components not essential for solving the problem in order to illustrate the above technology. can also be included. Therefore, it should not be immediately recognized that those non-essential components are essential just because they are described in the attached drawings and detailed description.

In addition, since the above-described embodiment is for illustrating the technology in the present disclosure, various changes, substitutions, additions, omissions, etc. can be made within the scope of claims or equivalents thereof.

The present disclosure is applicable to image processing technology, such as image search technology and image matching technology.

1 control unit 2 storage device 3 image acquisition unit 4 output interface 5 input interface 6 camera 11 person detection unit 12 query determination unit 13 query tracking unit 14 attribute determination unit 15 feature extraction unit 16 determination unit 17 feature combining unit 18 matching unit 21 feature Extraction model 22,222 Image list 50,250 Image data 100,200 Image processing device

Claims

A feature extracting unit that extracts a feature amount indicating the feature of the object from a plurality of images each showing the same object;
an attribute determination unit that determines an object attribute of the object included in at least two images among the plurality of images;
a determination unit that determines, based on the object attributes determined by the attribute determination unit, to combine the feature amounts of the at least two images when the object attributes of the at least two images are different from each other;
An image processing device comprising:
The image processing apparatus according to claim 1, further comprising a feature combining unit that combines the feature amounts of the at least two images when the determining unit determines to combine the feature amounts of the at least two images.
further comprising a matching unit that, when the feature combining unit combines the feature quantities of the at least two images, compares the combined feature quantity with a feature quantity of an image other than the at least two images. Item 3. The image processing apparatus according to item 2.
The collation unit is configured to display a similarity between the combined feature quantity and the feature quantity of an image other than the at least two images, if the degree of similarity is equal to or greater than a predetermined threshold. 4. The image processing apparatus according to claim 3, wherein it is determined that the object shown in the image other than the at least two images matches the object shown in the image.
an image acquisition unit for inputting a time-series image data group;
an object detection unit that detects at least one of the objects in the image data group;
a query determination unit that determines an image including one of the at least one object detected by the object detection unit as a query image;
a query tracking unit that searches and tracks an object indicated by the query image in chronological order in the image data group;
a storage device for storing an image list including a matching target image group that is a matching target of the object included in the query image;
further comprising
The feature extraction unit extracts feature amounts from the query image determined by the query determination unit, the tracked image tracked by the query tracking unit, and the matching target image group,
The attribute determining unit determines an object attribute of the target included in the query image and an object attribute of the target included in the tracking image as object attributes of the target included in the at least two images.
The image processing device according to any one of claims 1 to 4.
an image acquisition unit for inputting image data;
an object detection unit that detects at least one object in the image data;
a query determination unit that determines an image including one of the at least one object detected by the object detection unit as a query image;
a storage device for storing an image list including a matching target image group that is a matching target of the object included in the query image;
further comprising
The feature extraction unit extracts a feature amount of each of the matching target image groups,
wherein the attribute determination unit determines an object attribute of an object included in each of the matching target image groups as an object attribute of the object included in the at least two images;
The image processing device according to any one of claims 1 to 4.
the object is a person,
wherein the object attributes include multiple orientations of a person;
The image processing device according to any one of claims 1 to 6.
A feature quantity extraction step of extracting a feature quantity indicating a feature of the object from a plurality of images each showing the same object;
an attribute determination step of determining an object attribute of the object included in at least two images of the plurality of images;
a determination step of determining to combine the feature amounts of the at least two images when the object attributes of the at least two images are different from each other, based on the object attributes determined in the attribute determination step;
An image processing method including
9. The image processing method according to claim 8, further comprising a feature combining step of combining said feature amounts of said at least two images when said determining step determines to combine said feature amounts of said at least two images. .
A program for causing a control unit to execute the image processing method according to claim 8 or 9.