WO2021181642A1

WO2021181642A1 - Collation assistance device, collation assistance method, and program storage medium

Info

Publication number: WO2021181642A1
Application number: PCT/JP2020/010980
Authority: WO
Inventors: 秀治古明地; 那孟鈴木
Original assignee: 日本電気株式会社
Priority date: 2020-03-13
Filing date: 2020-03-13
Publication date: 2021-09-16
Also published as: JP7439899B2; JPWO2021181642A1

Abstract

A collation assistance device according to the present invention is provided with: a detection unit that, in order to assist in collating a two-dimensional input image against a previously registered three-dimensional image, detects, from at least one image in which a person is captured, an input image that includes a person wearing a lens; and an output unit for outputting a post-correction image, in which the position and size of an eye reflected by the lens are corrected using the power of the lens estimated from the input image, and a three-dimensional registered image, which is to be collated with the post-correction image.　

Description

Verification assistance device, verification assistance method and program storage medium

The present invention relates to a collation assisting device, a collation assisting method, and a program storage medium.

In some cases, a person photographed by a security camera or the like is visually collated with a person registered in a database.
Patent Document 1 describes a two-dimensional image captured by changing the orientation and size of the three-dimensional image when the user visually collates the three-dimensional image of the person registered in the database with the captured person. A technique for facilitating collation with is disclosed.

Utility Model Registration No. 3204175 Japanese Unexamined Patent Publication No. 2015-25859

In the technique described in Patent Document 1, the collation by the user is facilitated by changing the orientation and size of the three-dimensional image registered in the database. However, when the person shown in the captured image wears spectacles, the position and size of the eyes reflected in the lens look different from those of the naked eye due to the refraction of the lens. Since this becomes remarkable depending on the face orientation of the person and the like, when the user visually compares the two-dimensional image and the three-dimensional image, the matching accuracy of whether or not they are the same person is lowered. Patent Document 1 does not consider the above situation.

Therefore, in view of the above problems, it is an object of the present invention to improve the collation accuracy by reducing the change in appearance caused by wearing the spectacles.

According to one aspect of the present invention, in at least one input image of a person, the detection unit that detects the input image including the person wearing the lens and the power of the lens estimated in the input image are used. Provided is a collation assisting device including a corrected image which is an image in which the position and size of the eyes reflected on the lens are corrected, and an output unit which outputs a three-dimensional registered image which is a collation target with the corrected image. ..

According to one aspect of the present invention, in at least one input image obtained by capturing a person, an input image including a person wearing a lens is detected and reflected on the lens using the power of the lens estimated in the input image. Provided is a collation assisting method for outputting a corrected image, which is an image in which the position and size of an eye are corrected, and a three-dimensional registered image, which is a collation target between the corrected image.

According to one aspect of the present invention, a process of detecting an input image including a person wearing a lens in at least one input image obtained by capturing a person on a computer, and using the power of the lens estimated in the input image are used. A program that stores a program for executing a process of outputting a corrected image, which is an image in which the position and size of the eyes reflected on the lens are corrected, and a three-dimensional registered image, which is a collation target of the corrected image. A storage medium is provided.

According to one aspect of the present invention, in at least one or more input images of a person, a detection unit that detects an input image including a person wearing glasses having a lens, and a power of the lens estimated from the input image. A collation assisting device including a three-dimensional corrected image in which the eye position and size of the three-dimensional registered image to be collated with the input image are corrected using the above, and an output unit for outputting the input image. Is provided.

According to one aspect of the present invention, in at least one input image of a person, an input image including a person wearing glasses having a lens is detected, and the power of the lens estimated in the input image is used. Provided is a collation assisting method for outputting a three-dimensional corrected image obtained by correcting the eye position and size of a three-dimensional registered image to be collated with the input image and the input image.

According to one aspect of the present invention, a process of detecting an input image including a person wearing glasses having a lens in at least one input image of a person captured by a computer, the lens estimated in the input image A program for executing a process of outputting a three-dimensional corrected image obtained by correcting the eye position and size of a three-dimensional registered image to be collated with the input image using a frequency, and the input image. A program storage medium for storing images is provided.

According to the present invention, a collation assisting device, a collation assisting method, and a program capable of improving collation accuracy by reducing changes in appearance due to wearing eyeglasses are provided.

It is a figure which shows the whole configuration example of the collation assistance system in 1st Embodiment. It is a block diagram which shows the function of the collation assistance system in 1st Embodiment. It is a conceptual diagram explaining the state of the image pickup object in 1st to 4th Embodiment. It is a conceptual diagram explaining the state of the image pickup object in 1st to 4th Embodiment. It is a block diagram which shows the function of the frequency estimation part in 1st Embodiment. It is a conceptual diagram for demonstrating the calculation process executed by the collation assistance system in 1st to 4th Embodiment. It is a conceptual diagram for demonstrating the calculation process executed by the collation assistance system in 1st to 4th Embodiment. It is a flowchart which shows an example of image processing in 1st Embodiment. It is a flowchart which shows an example of the machine collation in 1st Embodiment. It is a flowchart which shows an example of the user input reception in 1st Embodiment. It is a block diagram which shows the function of the collation assistance system in 2nd Embodiment. It is a flowchart which shows an example of image processing in 2nd Embodiment. It is a flowchart which shows an example of the user input reception in 2nd Embodiment. It is a block diagram which shows the function of the collation assistance system in 3rd and 4th Embodiment. It is a flowchart which shows an example of the process in 3rd Embodiment. It is a flowchart which shows an example of the process in 4th Embodiment. It is a block diagram which shows the hardware configuration example of the image collation system in 1st to 4th Embodiment. This is an example of a calculation formula used for the processing in the present invention. It is a block diagram explaining the modification which can apply to embodiment of this invention.

Hereinafter, exemplary embodiments of the present invention will be described with reference to the drawings. Similar elements or corresponding elements may be designated by the same reference numerals in the drawings, and the description thereof may be omitted or simplified.

<Background>
People who wear eyeglasses for vision correction look different in facial features than when they do not wear eyeglasses. This is because the position and size of the eyes appear to change due to the refraction phenomenon of the lens. It is not easy for an administrator who checks the image captured by the camera to visually determine whether or not a person whose facial features change due to wearing glasses corresponds to any of the person images registered in the database. No. In the embodiment described later, a system that assists the administrator (user) will be described.

<First Embodiment>
The configuration of the collation assisting system 1 of the present embodiment will be described with reference to FIGS. 1 and 2. FIG. 1 shows an overall configuration example of the collation assisting system 1 according to the present embodiment. The collation assistance system is an information processing system including a processing device 100, an image pickup device 200, and a user terminal 300. Each device and terminal is connected via a network. A specific example of the situation shown in FIG. 1 is a situation in which a person is photographed by a camera installed on the street.

The image pickup device 200 is a terminal that captures a person and obtains a captured image, for example, a security camera installed on a street. The image pickup device 200 takes an image of a person passing through the image pickup section, and outputs the captured image to the processing device 100.

FIG. 2 is a functional block diagram showing the configuration of the collation assistance system 1. The collation assistance system 1 includes a processing device 100, an image pickup device 200, and a user terminal 300. The processing device 100 includes an input unit 110, a detection unit 120, an extraction unit 130, a frequency estimation unit 140, an image processing unit 150, a storage unit 160, a collation unit 170, and an output unit 180. The imaging device 200 includes an imaging unit 210. The user terminal 300 includes a display unit 310 and an operation reception unit 320.

The imaging unit 210 included in the imaging device 200 images a person passing through the imaging section and outputs the captured image to the processing device 100. The imaging unit 210 images the imaging section in time series according to, for example, a set frame rate. The image pickup unit 210 is not limited to the above specific example, and may be one that captures an image at a timing that receives an instruction from the outside, or may be one that captures a still image at a predetermined timing. In the present embodiment, the captured image may include at least the head of the person and may not include the whole body of the person.

The input unit 110 receives the input of the captured image output by the imaging unit 210. Here, the target (input image) for which the input unit 110 accepts the input may be all the captured images captured by the imaging unit 210, or may be a part of the captured images. For example, among the captured images captured by the imaging unit 210 in time series, the captured images may be extracted at predetermined time intervals and sequentially received for input. As an example of extracting captured images at predetermined time intervals, a captured image obtained by thinning out a part of the captured images captured by the imaging unit 210 according to a predetermined frame rate may be extracted. The target accepted by the input unit 110 is not limited to the above.

The detection unit 120 detects the human head and eyeglasses from the captured image received by the input unit 110. Here, for the detection of the head of a person and the spectacles, for example, a trained model in which the image of the head wearing the spectacles is learned by machine learning is used. As the machine learning method, for example, deep learning via a multi-layer neural network may be used. The method used to detect the wearing of a person's head and eyeglasses is not limited to the above. The detection unit 120 may detect the detection of the human head and the wearing of the spectacles in a stepwise manner, or may independently detect each of them. In this case, the wearing of eyeglasses may be detected after detecting the human head. Further, the detection unit 120 does not necessarily have to detect the head, and may detect eyeglasses.

The detection unit 120 detects an image in which the contour of the face is reflected inside the lens of the spectacles from the captured images that have detected the wearing of the spectacles.

Next, the state of the face reflected in the captured image will be described with reference to FIG. FIG. 3 is an example showing the state of the face in which the outline of the face is reflected inside the lens of the spectacles. FL in the figure shows the outline of a person's face, and GL1 and GL2 show a lens. In addition, VFL shows the contour of the face reflected inside the lens. FIG. 3A shows a state in which the person's face faces the front of the imaging unit 210 (face tilt angle is 0 degrees), and FIG. 3B shows a state in which the face does not face the front of the imaging unit 210 (face tilt). The angle is not 0 degrees). FIG. 3B shows, as an example, a state in which a person faces to the right. The state in which the face faces the front is a state in which the tilt angle of the face is close to 0 degrees with respect to the imaging direction and is smaller than a predetermined threshold value. The state in which the face is not facing the front is a state in which the inclination angle of the face with respect to the imaging direction is larger than a predetermined threshold value. A predetermined threshold value is arbitrarily set. As shown in FIG. 3A, when the face orientation is in front of the imaging unit 210 (the tilt angle of the face is 0 degrees), the contour of the face is difficult to be reflected inside the lenses GL1 and GL2. On the other hand, as shown in FIG. 3B, when the face orientation is not in front of the imaging unit 210 (the tilt angle of the face is not 0 degrees), the lens is affected by the refraction phenomenon as compared with the case where the face orientation is in front. The outline of the face is easily reflected inside the GL1.

The detection unit 120 estimates the tilt angle of the face with respect to the imaging direction for the face reflected in the captured image. The tilt angle of the face is an angle at which the face of the person is facing with reference to a straight line passing through the imaging unit 210 and the head of the person in the three-dimensional space. Specifically, when the face of the person faces the imaging unit 210, the inclination angle is close to 0 degrees and smaller than a predetermined threshold value. The image of the target for which the detection unit 120 detects the face may be a two-dimensional image extracted by the extraction unit 130, which will be described later.

The extraction unit 130 extracts an captured image in which the outline of the face is reflected inside the lens as a two-dimensional image. As illustrated in FIG. 3B, the extraction unit 130 extracts an image in which the contour of the face is reflected inside the lens. In other words, the extraction unit 130 extracts an image in which the face of the person is oriented obliquely so that the outline of the face is reflected inside the lens when viewed from the image pickup unit 210. The extraction unit 130 may extract an image in which the inclination angle of the face is equal to or more than the first threshold value arbitrarily set, or an image in which the inclination angle of the face is equal to or less than the second threshold value arbitrarily set. Here, the second threshold value may be set to a value larger than the first threshold value.

The "extraction" executed by the extraction unit 130 refers to a process of extracting a part of the captured images from a plurality of captured images. Specifically, from a plurality of captured images captured in time series according to a predetermined frame rate, only the captured images in which the wearing of eyeglasses is detected may be extracted. The "extraction" may be a concept including a process of cutting out a part of the captured image. For example, in the captured image including a plurality of people, one area in which the head wearing glasses is reflected may be cut out, or one area in which the person is reflected may be cut out. The process pointed to by "extraction" is not limited to the above specific example.

The power estimation unit 140 estimates the power of the glasses worn by a person in the two-dimensional image extracted by the extraction unit 130. The details of the frequency estimation unit 140 will be described with reference to FIG. FIG. 4 is a functional block diagram of the frequency estimation unit 140. The frequency estimation unit 140 includes a calculation unit 141, a learning model storage unit 142, and an estimation unit 143.

The calculation unit 141 calculates the distance between the face of the person imaged by the imaging unit 210 and the imaging unit 210 in the three-dimensional space. For the calculation of the distance, for example, the size of the area showing the face on the captured image or the length of the area showing the person may be used, but the method for calculating the distance is not limited to this, and those skilled in the art will appropriately know the technique. Can be applied.

The calculation unit 141 calculates the difference in the position of the contour of the face reflected on the inside and the outside of the lens (referred to as "contour difference" in the present specification). The concept of contour difference will be described in detail with reference to FIG. FIG. 5 is a diagram schematically showing a part of a face to which a lens is attached. The calculation unit 141 calculates the horizontal inter-pixel distance (first scanning distance E1) between the horizontal position of the center point of the eye and the position E4 of the contour of the face in the region inside the lens. The calculation unit 141 calculates the horizontal inter-pixel distance (second scanning distance E2) between the horizontal position of the center point of the eye and the position E3 of the contour of the face in the region outside the lens.

The calculation unit 141 calculates the contour difference (normalized contour difference) obtained by normalizing the difference between the first scanning distance E1 and the second scanning distance E2 using the distance between the imaging unit 210 and the face. The specific method of normalization is not limited, and for example, the difference between the first scanning distance E1 and the second scanning distance E2 may be normalized by dividing by the distance between the imaging unit 210 and the face. In normalizing the contour difference, any calculation formula may be used as long as the distance between the imaging unit 210 and the face is taken into consideration.

The calculation unit 141 calculates the incident angle of the light incident on the imaging unit 210 from an object reflected in an arbitrary pixel on the captured image. An example of a method of calculating the incident angle Ψ from a real object corresponding to an arbitrary pixel position on the captured image to the imaging unit 210 will be described in detail with reference to FIGS. 6 and 17. FIG. 6 is a diagram schematically showing an optical system of an imaging unit 210 that images a person. The CCD (Charge-Coupled Device) sensor CS detects light incident from a person via the camera lens P. The calculation formula (1) in FIG. 17 is a formula showing the derivation process of the incident angle Ψ. The variables constituting the calculation formula (1) will be described. The incident angle Ψ is reflected in the captured image from the center position of the captured image showing the entire imaging range of the imaging unit 210, with the first pixel distance xs from the center position in the captured image indicating the entire imaging range of the imaging unit 210 as a variable. The calculation formula shown in FIG. 17 using the second pixel distance c to the center position of the face, the number of pixels XL of the captured image, the face orientation Θ detected by the detection unit 120, and the angle of view Φ of the imaging unit 210 ( It can be expressed by 1). The calculation unit 141 calculates the incident angle at the position of the contour of the face reflected in the region inside the lens. For the calculation, for example, the calculation formula (1) shown in FIG. 17 can be used.

The learning model storage unit 142 stores a regression model that outputs the power of the lens when the incident angle and the normalized contour difference are input. The regression model is a trained model in which the combination of the incident angle and the normalized contour difference and the dioptric power of the lens is learned in advance.

The estimation unit 143 inputs the incident angle and contour difference calculated by the calculation unit 141 into the regression model stored in the learning model storage unit 142, and estimates the power of the lens.

The regression model described above may be learned to input the incident angle, the contour difference before normalization, and the distance between the imaging unit 210 and the face, and output the power of the lens. In this case, since it is not necessary to normalize the contour difference, the processing load can be reduced.

An example of the function of the power estimation unit 140 described above is the positional relationship between the contour reflected inside the lens and the contour reflected outside the lens in the captured image, and the photographing unit 210 and the person who photographed the person wearing the lens. The power of the lens is estimated using the positional relationship. In addition, in order to estimate the power of the lens, the power estimation unit 140, for example, as described in Patent Document 2 (Japanese Unexamined Patent Publication No. 2015-25859), outlines the face reflected inside the lens of the spectacles. A method of estimating the power of the lens based on the position of the lens, the position of the contour of the face visible on the outside of the lens, and the inclination angle of the face may be used. The method for estimating the power of the lens is not limited to the above.

The image processing unit 150 corrects the image of the two-dimensional image using the power information of the lens estimated by the power estimation unit 140, and generates a corrected image (corrected image) corresponding to the two-dimensional image. do. The area to be corrected is the area inside and around the lens in the image, and the contour of the face and the position and size of the eyes of the person reflected in the area in the lens of which the power is estimated are corrected. Specifically, the area reflected inside the lens is corrected so as to be enlarged so that the contour reflected inside the lens and the contour reflected outside the lens are located on the same curve.

The storage unit 160 stores three-dimensional images (three-dimensional registered images) of a plurality of registrants in association with registrant information. Examples of items of registrant information include registrant identification information, name, registration date and time, and the like.

The collation unit 170 collates a person reflected in the two-dimensional image extracted by the extraction unit 130 with a plurality of three-dimensional registered images stored in the storage unit 160, and collates the person imaged by the imaging unit 210 with a plurality of three-dimensional registrations. Calculate the similarity with the image. Here, the collation unit 170 sets a three-dimensional registered image having a similarity equal to or higher than a threshold value as a collation target by the user's visual inspection. There may be a plurality of three-dimensional registered images set as targets for visual collation.

The output unit 180 outputs a three-dimensional registered image set as a target for visual collation and a corrected image (corrected image) processed by the image processing unit 150.

The user terminal 300 includes a display unit 310 and an operation reception unit 320. The user terminal 300 is a terminal that provides information to the user or accepts the user's operation. Specifically, the user terminal 300 prompts the user to determine whether or not the three-dimensional registered image of the person registered in the database (not shown) and the person captured by the security camera or the like are the same person. It is a device. In addition, the device may be a device for confirming an image captured by a user with a security camera or the like, but the present invention is not limited to this.

The display unit 310 displays the three-dimensional registered image output by the output unit 180 and the corrected image processed by the image processing unit 150. The display unit 310 may display the three-dimensional registered image and the corrected image at the same time. At this time, the display unit 310 may display a plurality of three-dimensional registered images at the same time, or may sequentially display a plurality of three-dimensional registered images according to the degree of similarity. The display unit 310 may display a plurality of corrected images at the same time. Further, the display unit 310 may display, for example, the three-dimensional registered image and the corrected image on the same screen.

The user determines whether or not at least one three-dimensional registered image displayed by the display unit 310 and the corrected image indicate the same person, and inputs the determination result to the operation reception unit 320. The operation reception unit 320 accepts the input of the judgment result judged by the user by the operation of the user. The operation accepted by the operation receiving unit 320 is, for example, an operation of selecting an image of the same person as the corrected image to be collated from the three-dimensional registered images displayed on the display unit 310.

Next, the operation of the collation assisting system of the present embodiment will be described with reference to FIGS. 7 to 9. 7 to 9 are flowcharts showing an example of the processing in the present embodiment.

FIG. 7 shows an example of a flow related to image processing. First, the input unit 110 receives the input of the captured image output by the imaging device 200 (S101). The input unit 110 may sequentially accept inputs of images captured in time series according to a predetermined frame rate, for example.

The detection unit 120 detects the human head and eyeglasses from the captured image received by the input unit 110 (S102). The detection unit 120 may detect the detection of the human head and the wearing of the spectacles in a stepwise manner, or may independently detect each of them. If the detection unit 120 does not detect the spectacles from the captured image (S103, NO), the process returns to step S101 and the input of the next captured image is accepted (S101). When the detection unit 120 detects the spectacles from the captured image (S103, YES), the detection unit 120 shows the contour of the face inside the lens of the spectacles from the captured image detected that the spectacles are worn. Whether or not it is determined (S104). Twice

When it is determined that the contour is reflected on the inside of the lens (S105, YES), the extraction unit 130 extracts the captured image in which the contour is reflected on the inside of the lens from the captured image (S106). When it is determined that the contour of the face is not reflected inside the lens (S105, NO), the process returns to the process of step S101.

The power estimation unit 140 estimates the power of the lens worn by a person in the image extracted by the extraction unit 130 (S107).

The image processing unit 150 corrects the image extracted by the extraction unit 130 using the power information of the lens estimated by the power estimation unit 140, and generates a corrected image (corrected image) (S108). ..

FIG. 8 shows an example of a processing flow related to collation by the processing device 100. The collation unit 170 collates the person reflected in the two-dimensional image extracted by the extraction unit 130 with the plurality of three-dimensional registered images stored in the storage unit 160, and the person imaged by the imaging unit 210 and the plurality of registrants. The similarity of is calculated (S109). Here, when there is a three-dimensional registered image whose similarity is equal to or higher than the threshold value (S110, YES), the three-dimensional registered image is set as a target for collation by the user's visual inspection (S111). When there is no three-dimensional registered image whose similarity is equal to or higher than the threshold value (S110, NO), the process ends.

In the present embodiment, after the processing flow (S109 to S111) related to the collation by the apparatus shown in FIG. 8, a series of flows (S101 to S108) related to the image processing shown in FIG. 7 may be executed. In that case, since the power of the lens is estimated only for the image in which the lens is recognized, the processing load of the device can be reduced.

The flow related to collation (S109 to S111) and the flow related to image processing (S101 to S108) may be processed in parallel. The flow related to image processing (S101 to S108) may be executed before the flow related to collation (S109 to S111). The context of the series of flows related to image processing (S101 to S108) shown in FIG. 7 and the processing flows related to collation by the apparatus shown in FIG. 8 (S109 to S111) is not limited.

FIG. 9 shows an example of a processing flow related to image display. The output unit 180 outputs a three-dimensional registered image set as a target for visual collation and a corrected image (corrected image) processed by the image processing unit 150 (S112).

The display unit 310 displays the three-dimensional registered image output by the output unit 180 and the corrected image processed by the image processing unit 150 (S113).

The operation reception unit 320 accepts the input of the judgment result judged by the user by the operation of the user (S114). The operation accepted by the operation receiving unit 320 is, for example, an operation of selecting an image of the same person as the corrected image to be collated from the three-dimensional registered images displayed on the display unit 310.

The processing flow (S112 to S113) related to the display of the image shown in FIG. 9 is a series of flows (S101 to S108) related to the image processing shown in FIG. 7 and a processing flow related to collation by the apparatus shown in FIG. 8 (S101 to S108). It is executed after S109 to S111).

As a result, even if the collation target is a person wearing a lens, it is possible to reduce changes in appearance and facial impression due to wearing the lens, and the user can perform visual collation efficiently and accurately. be able to.

<Second embodiment>
Hereinafter, the collation assistance system 1 of the present embodiment will be described. The second embodiment is different from the first embodiment in that the processing device 100 includes a three-dimensional image processing unit 190 and corrects the three-dimensional registered image based on the estimated power information of the lens. do. The description of the parts common to the first embodiment will be omitted.

FIG. 10 is a functional block diagram showing the configuration of the collation assistance system 1 in the present embodiment. The processing device 100 includes an input unit 110, a detection unit 120, an extraction unit 130, a frequency estimation unit 140, an image processing unit 150, a storage unit 160, a collation unit 170, an output unit 180, and a three-dimensional image. It is provided with a processing unit 190.

The three-dimensional image processing unit 190 corrects the image around the eyes of a person in the three-dimensional registered image set by the collating unit 170 as the target of visual collation, and the corrected three-dimensional image corresponding to the three-dimensional registered image ( Generate an image after 3D correction). Specifically, the predetermined three-dimensional registered image is corrected according to the power information of the lens estimated by the power estimation unit 140. The predetermined three-dimensional registered image is a three-dimensional registered image set by the collation unit 170 as a target for visual collation. Known three-dimensional computer graphics are used for this correction. An example of correction according to the power information of the lens will be described. For example, since a concave lens is used for myopia correction spectacles, the eyes of a person wearing the spectacles look smaller than before the spectacles due to the influence of the refraction of the lens. In this case, the three-dimensional image processing unit 190 uses the estimated power information of the concave lens to reproduce the change in facial features that appears to be refracted by the lens when the person indicated by the three-dimensional registered image wears the concave lens. , Correct the image around the eyes.

The storage unit 160 stores a plurality of three-dimensional registered images in association with the registrant information. Examples of items of registrant information include registrant identification information, name, registration date and time, and the like. The storage unit 160 may store the three-dimensional corrected image generated by the three-dimensional image processing unit 190.

The collation unit 170 collates a person reflected in the two-dimensional image extracted by the extraction unit 130 with a three-dimensional registered image indicating a plurality of registrants stored in the storage unit 160, and collates a plurality of persons imaged by the imaging unit 210. Calculate the degree of similarity with the registrant of. The collation unit 170 sets the obtained three-dimensional registered image having a similarity equal to or higher than the threshold value as a collation target by the user's visual inspection. There may be a plurality of three-dimensional registered images set as targets for visual collation.

The output unit 180 outputs a three-dimensional corrected image showing a person set as a target for visual collation and an image extracted by the extraction unit 130. The three-dimensional corrected image is a three-dimensional image generated by the image processing unit 150.

The display unit 310 displays a three-dimensional corrected image showing a person set as a target for visual collation and a two-dimensional image extracted by the extraction unit 130. The display unit 310 may display the three-dimensional corrected image and the two-dimensional image at the same time. At this time, the display unit 310 may display a plurality of three-dimensional images at the same time, or may sequentially display a plurality of three-dimensional images according to the obtained similarity. The display unit 310 may display a plurality of two-dimensional images at the same time. Further, the display unit 310 may display, for example, the three-dimensional registered image and the corrected image on the same screen.

The user determines whether or not at least one three-dimensional image displayed by the display unit 310 and the two-dimensional image indicate the same person, and inputs the determination result to the operation reception unit 320. The operation reception unit 320 accepts the input of the judgment result judged by the user by the operation of the user. The operation received by the operation receiving unit 320 is, for example, an operation of selecting an image of the same person as the two-dimensional image to be collated from the three-dimensional images displayed on the display unit 310.

Next, the operation of the collation assisting system of the present embodiment will be described with reference to FIGS. 11 and 12. FIG. 11 shows an example of a processing flow related to image processing in the present embodiment. The description of the process that overlaps with the first embodiment will be omitted. In the present embodiment, the processing flow related to the collation by the apparatus is the same as that in the first embodiment (S109 to S111).

The power estimation unit 140 estimates the power of the glasses worn by a person in the image extracted by the extraction unit 130 (S107). Next, the three-dimensional image processing unit 190 corrects the image around the eyes of a person using the estimated lens power information in the three-dimensional registered image set by the collation unit 170 as the target of visual collation, and three-dimensionally. A corrected image is generated (S208).

The storage unit 160 stores the three-dimensional corrected image in association with the registrant information (S209).

The output unit 180 outputs a three-dimensional corrected image showing a person set as a target for visual collation and an image extracted by the extraction unit 130 (S212).

The display unit 310 displays a three-dimensional corrected image showing a person set as a target for visual collation and a two-dimensional image extracted by the extraction unit 130 (S213). The display unit 310 may display the three-dimensional corrected image and the two-dimensional image at the same time. At this time, the display unit 310 may display a plurality of three-dimensional images at the same time, or may sequentially display a plurality of three-dimensional images according to the degree of similarity. The display unit 310 may display a plurality of two-dimensional images at the same time.

The user determines whether or not at least one three-dimensional image displayed by the display unit 310 and the two-dimensional image indicate the same person, and inputs the determination result to the operation reception unit 320. The operation receiving unit 320 receives the input of the judgment result judged by the user by the operation of the user (S114).

In the present embodiment, after the processing flow (S109 to S111) related to the collation by the apparatus shown in FIG. 8, a series of flows (S101 to S107, S208 to S209) related to the image processing shown in FIGS. 7 and 11 follow. It may be executed. In that case, since the power of the lens is estimated only for the image in which the lens is recognized, the processing load of the device can be reduced.

The flow related to collation (S109 to S111) and the flow related to image processing (S101 to S107, S208 to S209) may be processed in parallel. The flow related to image processing (S101 to S107, S208 to S209) may be executed before the flow related to collation (S109 to S111). The context of the series of flows related to image processing shown in FIGS. 7 and 11 (S101 to S107, S208 to S209) and the processing flow related to collation by the apparatus shown in FIG. 8 (S109 to S111) is not limited.

As described above, in the collation assistance system of the present embodiment, when the imaged person is wearing glasses, the three-dimensional registered image held by the database is corrected according to the change in appearance due to wearing the glasses. As a result, even if the collation target is a person wearing spectacles, it is possible to reduce changes in appearance and facial impression due to wearing spectacles, and the user can perform visual collation efficiently and accurately. be able to.

<Third embodiment>
The configuration of the collation assistance system 1 of the present embodiment will be described with reference to FIG. FIG. 13 is a functional block diagram of the collation assistance system 1 according to the present embodiment. The collation assistance system 1 includes a detection unit 120 and an output unit 180.

The detection unit 120 detects an input image including a person wearing a lens in at least one input image obtained by capturing a person.

The output unit 180 is a three-dimensional registered image that is a collation target between the corrected image, which is an image obtained by correcting the input image using the power information of the lens estimated from the input image detected by the detection unit 120, and the corrected image. And output. The corrected image is, for example, an image in which the position and size of the eyes reflected on the lens in the input image are corrected.

Next, the operation of the collation assistance system of the present embodiment will be described with reference to FIG. FIG. 14 is a flowchart showing an example of the processing in the present embodiment.

The detection unit 120 detects an input image including a person wearing a lens in the captured image (S307).

The output unit 180 is a three-dimensional registered image that is a collation target between the corrected image, which is an image obtained by correcting the input image using the power information of the lens estimated from the input image detected by the detection unit 120, and the corrected image. And is output (S308).

As a result, even if the collation target is a person wearing spectacles, it is possible to reduce changes in appearance and facial impression due to wearing spectacles, and the user can perform visual collation efficiently and accurately. be able to.

<Fourth Embodiment>
The configuration of the collation assistance system 1 of the present embodiment will be described with reference to FIG. 13 as in the third embodiment. The present embodiment and the third embodiment are different in that the functions of the output unit 180 are different. FIG. 13 is a functional block diagram of the collation assistance system 1 according to the present embodiment. The collation assistance system 1 includes a detection unit 120 and an output unit 180.

The detection unit 120 detects an input image including a person wearing eyeglasses having a lens in at least one input image obtained by capturing a person.

The output unit 180 outputs a three-dimensional corrected image obtained by correcting the three-dimensional registered image and the input image by using the power of the lens estimated from the input image. The three-dimensional corrected image output by the output unit 180 is, for example, an image in which the position and size of the eyes of the three-dimensional registered image to be collated with the input image are corrected.

Next, the operation of the collation assistance system of the present embodiment will be described with reference to FIG. FIG. 15 is a flowchart showing an example of the processing in the present embodiment.

The detection unit 120 detects an input image including a person wearing eyeglasses having a lens in at least one input image obtained by capturing a person (S407).

The output unit 180 outputs the three-dimensional corrected image corrected by the three-dimensional registered image and the input image by using the power of the lens estimated from the input image (S408).

As a result, even if the collation target is a person wearing glasses, it is possible to reduce changes in appearance and changes in facial impression due to wearing glasses. It is possible to generate an image for the user to perform visual collation efficiently and accurately.

(Hardware configuration)
Next, an example of a hardware configuration in which the processing device 100, the image pickup device 200, and the user terminal 300 in each of the above-described embodiments is realized by using one or more computers will be described. Each functional unit included in the processing device 100, the image pickup device 200, and the user terminal 300 includes at least one CPU (Central Processing Unit) of an arbitrary computer, at least one memory, a program loaded into the memory, and at least a program for storing the program. It is realized by any combination of hardware and software centering on a storage unit such as one hard disk, an interface for network connection, and the like. It will be understood by those skilled in the art that there are various modifications of this realization method and apparatus. The storage unit can store not only programs stored before the device is shipped, but also storage media such as optical disks, magneto-optical disks, and semiconductor flash memories, and programs downloaded from servers on the Internet.

FIG. 16 is a block diagram illustrating the hardware configurations of the processing device 100, the imaging device 200, and the user terminal 300. As shown in FIG. 16, the processing device 100, the image pickup device 200, and the user terminal 300 include a processor 1A, a memory 2A, an input / output interface 3A, a peripheral circuit 4A, a communication interface 5A, and a bus 6A. The peripheral circuit 4A includes various modules. The processing device 100, the image pickup device 200, and the user terminal 300 do not have to have the peripheral circuit 4A. The processing device 100, the image pickup device 200, and the user terminal 300 may be composed of a plurality of physically and / or logically separated devices. In this case, each of the plurality of devices can be provided with the above hardware configuration.

The bus 6A is a data transmission path for the processor 1A, the memory 2A, the input / output interface 3A, the peripheral circuit 4A, and the communication interface 5A to send and receive data to and from each other. The processor 1A is, for example, an arithmetic processing unit such as a CPU, a GPU (Graphics Processing Unit), or a microprocessor. The processor 1A can execute the process according to various programs stored in the memory 2A, for example.

The memory 2A is, for example, a memory such as a RAM (RandomAccessMemory) or a ROM (ReadOnlyMemory), and stores a program and various data.

The input / output interface 3A includes an interface for acquiring information from an input device, an external device, an external storage unit, an external sensor, a camera, etc., an interface for outputting information to an output device, an external device, an external storage unit, etc. including. The input device is, for example, a touch panel, a keyboard, a mouse, a microphone, a camera, or the like. The output device is, for example, a display, a speaker, a printer, a lamp, or the like.

Processor 1A can issue commands to each module and perform calculations based on the calculation results.

The communication interface 5A realizes that the processing device 100, the image pickup device 200, and the user terminal 300 communicate with each other with the external device, and also realizes that the processing device 100, the image pickup device 200, and the user terminal 300 communicate with each other. A computer may configure some functions of the processing device 100, the imaging device 200, and the user terminal 300.

<Modification example>
An example of modification applicable to the above-described embodiment will be described. If the power estimation unit 140 fails to estimate the power of the lens in the captured image of the target, the captured images taken before and after the captured image may be processed again. Further, the image processing unit 150 may perform image processing on the captured image so as to remove the edges of the spectacles.

Other modified examples will be explained. The imaging unit 210 may include a device that acquires three-dimensional depth information (depth information). In that case, the calculation unit 141 does not need to calculate the distance between the image pickup unit 210 and the person (or the face of the person) from the two-dimensional image, and the estimation unit 143 uses the depth information acquired by the image pickup unit 210 to use the depth information. The power of the lens may be estimated. Further, the collation unit 170 collates the person reflected in the two-dimensional image extracted by the extraction unit 130 with the plurality of three-dimensional registered images stored in the storage unit 160, and the corrected image processed by the image processing unit 150. It may be designed to collate with the three-dimensional registered image.

Other modifications applicable to the above embodiment will be described. The image pickup apparatus 200 may execute a part or all of the processes executed by the processing apparatus 100. For example, as shown in FIG. 18, the image pickup apparatus 200 may include a detection unit 120, an extraction unit 130, and a power estimation unit 140 in addition to the image pickup unit 210. In this case, the image processing device 200 may estimate the power of the lens, and the image processing unit 150 included in the processing device 100 may acquire information regarding the power of the lens from the image pickup device 200. In this case, the image pickup apparatus 200 may execute the processes of S101 to S107. Further, the imaging device 200 may be a target of a process of extracting only an captured image in which the contour difference of a person reflected in the captured image is equal to or greater than a threshold value and estimating the power of the lens, or a process of transmitting the image to the processing device 100. According to the above modification, it is possible to reduce the processing load of the processing device 100 and the load related to the transmission processing of the collation assisting system in the present invention.

Note that the configurations of the above-described embodiments may be combined or some components may be replaced. Further, the configuration of the present invention is not limited to the above-described embodiment, and various modifications may be made without departing from the gist of the present invention. Some or all of the above embodiments may also be described, but not limited to:

(Appendix 1)
A detection unit that detects an input image including a person wearing a lens in at least one input image of a person.
Outputs a three-dimensional registered image that is a collation target between the corrected image, which is an image in which the position and size of the eyes reflected on the lens are corrected using the power of the lens estimated in the input image, and the corrected image. Output section and
A collation assisting device comprising.

(Appendix 2)
A display unit that displays the corrected image and the three-dimensional registered image on the same screen.
The collation assisting device according to Appendix 1, further comprising.

(Appendix 3)
The display unit displays a three-dimensional registered image having a similarity equal to or higher than a threshold value obtained by collating the input image corresponding to the corrected image with the three-dimensional registered image of a plurality of registrants.
The collation assisting device according to Appendix 2.

(Appendix 4)
A detection unit that detects an input image including a person wearing eyeglasses having a lens in at least one input image of a person.
The three-dimensional corrected image obtained by correcting the eye position and size of the three-dimensional registered image to be collated with the input image using the power of the lens estimated in the input image, and the input image are output. Output section and
A collation assisting device comprising.

(Appendix 5)
The collation assisting device according to Appendix 4, further comprising a display unit that displays the three-dimensional corrected image and the input image on the same screen.

(Appendix 6)
The display unit displays the three-dimensional corrected image corresponding to the three-dimensional registered image whose similarity obtained by collation with the input image is equal to or higher than the threshold value.
The collation assisting device according to Appendix 5.

(Appendix 7)
The positional relationship between the contour reflected inside the lens and the contour reflected outside the lens in the input image, and the positional relationship between the imaging unit and the person who photographed the person wearing the lens are used. The collation assisting device according to any one of Appendix 1 to 6, further comprising an estimation unit for estimating the power of the lens.

(Appendix 8)
The estimation unit estimates the power of the lens by using an image in which the distance between the contours reflected on the inside and the outside of the lens worn by the person is equal to or more than a threshold value in a plurality of input images obtained by capturing the person. 7. The collation assisting device according to 7.

(Appendix 9)
The estimation unit estimates the power of the lens reflected in the input image with respect to the input image whose similarity obtained by collation with any of the three-dimensional registered images of a plurality of registrants is equal to or more than a threshold value. , The collation assisting device according to Appendix 7 or 8.

(Appendix 10)
In at least one input image obtained by capturing a person, an input image including a person wearing a lens is detected, and the input image is detected.
Outputs a three-dimensional registered image that is a collation target between the corrected image, which is an image in which the position and size of the eyes reflected on the lens are corrected using the power of the lens estimated in the input image, and the corrected image. do,
Matching assistance method.

(Appendix 11)
On the computer
A process of detecting an input image including a person wearing a lens in at least one input image obtained by capturing a person.
Outputs a three-dimensional registered image that is a collation target between the corrected image, which is an image in which the position and size of the eyes reflected on the lens are corrected using the power of the lens estimated in the input image, and the corrected image. Processing to do,
A program storage medium that stores a program for executing a program.

(Appendix 12)
In at least one input image of a person, an input image including a person wearing spectacles with a lens is detected.
The three-dimensional corrected image obtained by correcting the eye position and size of the three-dimensional registered image to be collated with the input image using the power of the lens estimated in the input image, and the input image are output. do,
Matching assistance method.

(Appendix 13)
On the computer
A process of detecting an input image including a person wearing spectacles having a lens in at least one input image obtained by capturing a person.
The three-dimensional corrected image obtained by correcting the eye position and size of the three-dimensional registered image to be collated with the input image using the power of the lens estimated in the input image, and the input image are output. Processing to do,
A program storage medium that stores a program for executing a program.

1 Collation assistance system 100 Processing device 110 Input unit 120 Detection unit 130 Extraction unit 140 Frequency estimation unit 141 Calculation unit 142 Learning model storage unit 143 Estimating unit 150 Image processing unit 160 Storage unit 170 Matching unit 180 Output unit 190 Three-dimensional image processing unit 200 Imaging device 210 Imaging unit 300 User terminal 310 Display unit 320 Operation reception unit 1A Processor 2A Memory 3A Input / output interface 4A Peripheral circuit 5A Communication interface 6A Bus E1 First scanning distance E2 Second scanning distance E3, E4, FL Face contour Contour of face in VFL lens GL1, GL2 Lens Ψ Incident angle Θ Face direction of person Φ Angle of view xs 1st pixel distance c 2nd pixel distance XL Number of pixels of captured image P Camera lens CS CCD sensor

Claims

A detection unit that detects an input image including a person wearing a lens in at least one input image of a person.
Outputs a three-dimensional registered image that is a collation target between the corrected image, which is an image in which the position and size of the eyes reflected on the lens are corrected using the power of the lens estimated in the input image, and the corrected image. Output section and
A collation assisting device comprising.
A display unit that displays the corrected image and the three-dimensional registered image on the same screen.
The collation assisting device according to claim 1.
The display unit displays a three-dimensional registered image having a similarity equal to or higher than a threshold value obtained by collating the input image corresponding to the corrected image with the three-dimensional registered image of a plurality of registrants.
The collation assisting device according to claim 2.
A detection unit that detects an input image including a person wearing eyeglasses having a lens in at least one input image of a person.
The three-dimensional corrected image obtained by correcting the eye position and size of the three-dimensional registered image to be collated with the input image using the power of the lens estimated in the input image, and the input image are output. Output section and
A collation assisting device comprising.
The collation assisting device according to claim 4, further comprising a display unit that displays the three-dimensional corrected image and the input image on the same screen.
The display unit displays the three-dimensional corrected image corresponding to the three-dimensional registered image whose similarity obtained by collation with the input image is equal to or higher than the threshold value.
The collation assisting device according to claim 5.
The positional relationship between the contour reflected inside the lens and the contour reflected outside the lens in the input image, and the positional relationship between the imaging unit and the person who photographed the person wearing the lens are used. The collation assisting device according to any one of claims 1 to 6, further comprising an estimation unit for estimating the power of the lens.
The estimation unit estimates the power of a plurality of input images of a person by using an image in which the distance between the contours reflected on the inside and outside of the lens worn by the person is equal to or greater than a threshold value. Item 7. The collation assisting device according to item 7.
The estimation unit estimates the power of the lens reflected in the input image with respect to the input image whose similarity obtained by collation with any of the three-dimensional registered images of a plurality of registrants is equal to or more than a threshold value. , The collation assisting device according to claim 7 or 8.
In at least one input image obtained by capturing a person, an input image including a person wearing a lens is detected, and the input image is detected.
Outputs a three-dimensional registered image that is a collation target between the corrected image, which is an image in which the position and size of the eyes reflected on the lens are corrected using the power of the lens estimated in the input image, and the corrected image. do,
Matching assistance method.
On the computer
A process of detecting an input image including a person wearing a lens in at least one input image obtained by capturing a person.
Outputs a three-dimensional registered image that is a collation target between the corrected image, which is an image in which the position and size of the eyes reflected on the lens are corrected using the power of the lens estimated in the input image, and the corrected image. Processing to do,
A program storage medium that stores a program for executing a program.
In at least one input image of a person, an input image including a person wearing spectacles with a lens is detected.
The three-dimensional corrected image obtained by correcting the eye position and size of the three-dimensional registered image to be collated with the input image using the power of the lens estimated in the input image, and the input image are output. do,
Matching assistance method.
On the computer
A process of detecting an input image including a person wearing spectacles having a lens in at least one input image obtained by capturing a person.
The three-dimensional corrected image obtained by correcting the eye position and size of the three-dimensional registered image to be collated with the input image using the power of the lens estimated in the input image, and the input image are output. Processing to do,
A program storage medium that stores a program for executing a program.