WO2021157214A1

WO2021157214A1 - Information processing device, method for extracting silhouette, and program

Info

Publication number: WO2021157214A1
Application number: PCT/JP2020/047061
Authority: WO
Inventors: 真也阪田
Original assignee: オムロン株式会社
Priority date: 2020-02-04
Filing date: 2020-12-16
Publication date: 2021-08-12
Also published as: JP2021124868A

Abstract

An information processing device according to the present invention comprises: an acquisition means for acquiring, as a first image, a distance image that is a two-dimensional image having, as pixel values, information of distances between an image pickup device and objects present in an imaging range which includes a target object; and an extraction means for extracting the silhouette of the target body from the first image.

Description

Information processing device, silhouette extraction method, program

The present invention relates to an information processing device, a silhouette extraction method, and a program.

It is known that a person's walking differs depending on the person. Therefore, there is a technique called gait authentication that extracts a silhouette (outer shape; contour) of a person from an image of the person and authenticates the person. In gait authentication, it is possible to authenticate an individual by determining the stride length, the size of the arm swing, the speed of the walking pitch, and the degree of bending of the back based on the silhouette. By performing gait authentication in this way, it is possible to authenticate an individual even when the resolution of the captured image is low or when the face is not shown.

In Patent Document 1, a method of extracting a silhouette by taking a difference between an optical image of a person walking and a background image, and a method of binarizing pixels in an optical image of a person walking are performed. It describes how to extract the silhouette by doing so.

Japanese Unexamined Patent Publication No. 4-33066

However, in the technique described in Patent Document 1, when the color of the clothes worn by the target person and the background color are similar, it is not possible to appropriately take a difference or binarize, and accurately. The silhouette may not be extracted.

Therefore, an object of the present invention is to provide a technique for extracting a silhouette of an object body more accurately even if the colors of the area of the object body and the background area are similar.

In order to achieve the above object, the present invention adopts the following configuration.

That is, the information processing apparatus according to one aspect of the present invention acquires a distance image as a first image, which is a two-dimensional image having information on the distance between an object existing in an imaging range including an object and the imaging apparatus as a pixel value. It has an acquisition means for extracting the silhouette of the object from the first image, and an extraction means for extracting the silhouette of the object.

According to the above configuration, since the silhouette of the target body is extracted from the distance image that does not depend on the color of the target body or the background, it is possible to extract the silhouette in an appropriate range as compared with the case of extracting the silhouette from the optical image or the like. .. It can be said that the silhouette is the outer shape and contour of the object.

The information processing apparatus according to the one aspect further includes a detection means for detecting a target body region including the target body from the first image, and the extraction means may extract the silhouette from the target body region. good. By extracting the silhouette from the target body region after detecting the target body region, a two-step range determination process is performed to extract the silhouette, so that the silhouette can be extracted with higher accuracy. The target body region may have any shape such as a rectangle, a circle, or a polygon as long as it is a region including the target body.

In the information processing apparatus according to the one aspect, the extraction means uses a region indicating a distance within a predetermined value as a silhouette from the target body region so that the difference from the representative value of the distance indicated by the target body region is included within a predetermined value. It may be extracted. The representative value can be the minimum value, the mode value, or the average value. For example, since the object is more likely to be closer to the image pickup device than the background, if the representative value is the minimum value, the silhouette can be extracted with high accuracy.

In the information processing device according to the above aspect, the representative value may be the mode value. In the target body region, it is considered that the size of the region occupied by the target body is large. Therefore, if the region included in the distance within a predetermined value from the most frequent distance is extracted, the silhouette can be extracted with high accuracy.

In the information processing device according to the one aspect, a distance image not including the object, which is the same image pickup range as when the image pickup device was imaged for acquiring the first image, is recorded as a second image. The detection means may further include a recording means, and the detection means may detect the target region based on the difference between the first image and the second image. There is a high possibility that there is no difference between a distance image including an object and a distance image not including the object, such as an image of the same shooting range, except for the area where the object exists. Therefore, according to this configuration, the target body region can be detected with higher accuracy. It also makes it possible to extract silhouettes without using optical images.

In the information processing apparatus according to the one aspect, an optical image not including the object, which is the same imaging range as when the imaging apparatus imaged to acquire the first image, is recorded as a third image. Further having a recording means, the acquisition means acquires an optical image in which the imaging range is captured at the same time as the first image as a fourth image, and the detection means includes the third image and the fourth image. The target region may be detected based on the difference. There is a high possibility that there is no difference between the optical image including the object and the optical image not including the object, such as those captured in the same shooting range, except for the region where the object exists. Therefore, according to this configuration, the target body region can be detected with higher accuracy.

In the information processing apparatus according to the one aspect, the first image includes at least a first frame and a second frame, and the detection means is based on the difference between the first frame and the second frame. The target area may be detected. There is a high possibility that there is no difference in the region other than the target body between the first frame and the second frame. Therefore, according to this configuration, the target body region can be detected with higher accuracy. Further, since it is not necessary to record a distance image, an optical image, or the like in which the object is not captured in the recording means, the recording capacity can be reduced.

In the information processing apparatus according to the one aspect, even if the detection means acquires the target body region from the first image by using a learned model in which machine learning for detecting the target body region is performed. good.

The information processing device according to the above aspect may further have a feature amount extraction means for extracting the feature amount in the silhouette.

The information processing apparatus according to the above aspect may further include a collating means for collating the feature amount extracted by the feature amount extracting means with the feature amount acquired in advance. Thereby, it is possible to determine whether or not the target body indicated by the feature amount acquired in advance and the target body captured in the first image are the same, the high possibility that they are the same, and the like.

In the information processing apparatus according to the one aspect, the first image includes a plurality of frames, the extraction means extracts silhouettes from each of the plurality of frames, and the feature amount extraction means extracts the extraction. The feature amount may be extracted based on a plurality of silhouettes extracted by the means.

In the information processing apparatus according to the one aspect, the feature amount extracting means may extract an image obtained by averaging a plurality of silhouettes extracted by the extraction means as the feature amount. The object may be a walking person. According to such a configuration, the feature amount can indicate the walking width, the swing width of the hand, the walking speed, etc. when a person is walking. Therefore, more accurate gait authentication can be realized.

The silhouette extraction method according to one aspect of the present invention acquires a distance image as a first image, which is a two-dimensional image having information on the distance between an object existing in an imaging range including an object and an imaging device as a pixel value. It has a step and an extraction step of extracting the silhouette of the object from the first image.

The present invention may be regarded as a control device having at least a part of the above means, or may be regarded as a silhouette extraction device or a silhouette extraction system. Further, the present invention may be regarded as an information processing method including at least a part of the above processing and a control method of an information processing device. Further, the present invention can also be regarded as a program for realizing such a method and a recording medium in which the program is recorded non-temporarily. It should be noted that each of the above means and treatments can be combined with each other as much as possible to form the present invention.

According to the present invention, when extracting the silhouette of the object, the silhouette can be extracted more accurately even if the color of the area of the object and the color of the background area are similar.

FIG. 1 is a configuration diagram of an information processing system. FIG. 2A is a diagram showing an optical image including a person. FIG. 2B is a diagram showing a distance image in the imaging range of the optical image shown in FIG. 2A. FIG. 2C is a diagram showing an optical image when no person is included in the imaging range of the optical image shown in FIG. 2A. FIG. 2D is a diagram showing an image showing a silhouette extracted from the distance image shown in FIG. 2B. 3A to 3C are diagrams showing images of each frame of the distance image. FIG. 3D is a diagram showing feature quantities extracted from the images shown in FIGS. 3A to 3C. FIG. 4 is a flowchart showing the processing of the information processing apparatus.

Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings.

<Application example>
Hereinafter, the information processing system according to the present embodiment will be described. The information processing system extracts a silhouette of a person from a distance image which is a two-dimensional image having information on the distance between an object existing in the imaging range 10 including a person (moving object) as an object and the imaging device 100 as a pixel value. .. Unlike optical images, distance images are not affected by color, so it is possible to accurately acquire (extract) human silhouettes. In this embodiment, a case where gait authentication is performed using a distance image will be described, and the distance image will be described as a moving image (video) composed of a plurality of frames.

[Information processing system configuration]
As shown in FIG. 1, the information processing system according to the present embodiment includes an image pickup device 100 and an information processing device 200. The information processing device 200 may be present inside the image pickup device 100, or the image pickup device 100 may be present inside the information processing device 200.

By imaging the imaging range 10, the imaging device 100 acquires a distance image showing the distance between the object existing in the imaging range 10 and the imaging device 100. Here, the image pickup device 100 can acquire the distance between the object existing in the image pickup range 10 and the image pickup device 100 by the parallax of the two optical images taken by the TOF (Time Of Flight) sensor or the stereo camera. can. When the imaging device 100 has an infrared light irradiation unit and an imaging unit, the irradiation angle of the light emitted from the irradiation unit, the angle at which the light reflected by the object is inserted into the imaging unit, and the irradiation unit Depending on the distance between the imaging unit, the distance between the object existing in the imaging range 10 and the imaging device 100 may be acquired. Further, the imaging device 100 may be able to acquire an optical image in the imaging range 10 at the same time as acquiring the distance image. The image pickup apparatus 100 outputs the acquired distance image to the information processing apparatus 200.

The distance image is an image composed of pixels having pixel values according to the distance. For example, as shown in FIG. 2B, the closer the distance from the image pickup apparatus 100 is, the brighter the pixels are, and the farther the distance is, the darker the pixels are. The distance image may be represented by bright pixels as the distance from the image pickup apparatus 100 increases, and may be represented by dark pixels as the distance from the image pickup apparatus 100 increases. Further, the distance image may be shown in a color (color) according to the distance from the image pickup apparatus 100. Further, the distance image may be an optical image in which each pixel has a numerical value (information) of the distance from the image pickup apparatus 100.

(Configuration of information processing device)
The information processing device 200 includes an acquisition unit 201, a detection unit 202, a silhouette extraction unit 203, a feature amount extraction unit 204, a collation unit 205, an output unit 206, and a recording unit 207.

The acquisition unit 201 acquires a distance image from the image pickup device 100. Here, when the image pickup apparatus 100 acquires the optical image at the same time as the distance image, the acquisition unit 201 also acquires the optical image from the image pickup apparatus 100. Any communication method may be used for communication between the image pickup apparatus 100 and the acquisition unit 201 (information processing apparatus 200), and either wired communication or wireless communication may be used.

The detection unit 202 detects a person in a distance image. Here, "detecting a person" means identifying (detecting) an area including a person in a distance image. The detection unit 202 does not specify the silhouette (outer shape; contour) of the person, but specifies, for example, a rectangular area including the person. That is, the detection unit 202 identifies (detects) a region wider than the region corresponding to the silhouette of the person as a region including the person. The detection unit 202 outputs information on a region including a person in the distance image to the silhouette extraction unit 203. The region to be specified does not have to be rectangular, and may have any shape such as a circle or a polygon.

For example, the detection unit 202 detects the difference (pixels) between the optical image acquired at the same time as the distance image as shown in FIG. 2A and the optical image (background image) obtained by capturing the imaging range 10 excluding people as shown in FIG. 2C. Take values, color differences, etc.). It should be noted that these two images are images obtained by capturing the same imaging range. Then, the detection unit 202 identifies that a person exists in a rectangular range 302 including a region in which the distance difference is equal to or more than a predetermined value in the distance image as shown in FIG. 2B. An optical image of an imaging range 10 that does not include a person as shown in FIG. 2C is acquired in advance by the imaging apparatus 100 and recorded in the recording unit 207. In the following, the area where a person is identified is referred to as a "human area (object area)".

As a method for detecting a person in a distance image, for example, the following examples (1) to (3) may be used.

(1) The detection unit 202 is a difference (1) between the acquired distance image and a distance image (background distance image) having information on the distance between the object existing in the imaging range 10 excluding people and the imaging device 100 as pixel values. Take pixel values, color differences, etc.). It should be noted that these two images are images obtained by capturing the same imaging range. Then, the detection unit 202 identifies (detects) a rectangular region including a region having a difference of a predetermined value or more as a human region. A distance image showing the distance between an object existing in the imaging range 10 excluding people and the imaging device 100 is acquired in advance by the imaging device 100 and recorded in the recording unit 207.

(2) The detection unit 202 takes the difference between the current frame of the acquired distance image and the previous frame, and sets an area including a region having a shorter distance than the previous frame in the current frame as a human region. Specified (detected) as.

(3) The detection unit 202 has a learning model (learned model) learned by machine learning such as deep learning, and by inputting a distance image into the trained model, the human region in the distance image is transmitted from the trained model. To get. The trained model can be generated as follows. First, a plurality of combinations of information on the human region in the distance image and the distance image are input to the learning model as teacher data for learning. Then, a trained model is generated from the teacher data based on the algorithm of SVM or the neural network.

The silhouette extraction unit 203 extracts a person's silhouette (outer shape; contour) from the person area of the distance image. Specifically, the silhouette extraction unit 203 has a range in which the difference from the representative value of the distance indicated by the human area is within a predetermined value (for example, when the representative value indicates 250 cm, the range is 10 cm before and after. A region indicating a distance included in a certain 240 to 260 cm) is extracted as indicating a silhouette of a person. Here, the predetermined value can be, for example, a size smaller than the thickness of the human body (for example, 50 cm). If the silhouette of a person can be extracted from the human area of the distance image, an arbitrary method may be used, for example, extracting the silhouette based on the similarity with the shape of the humanoid stored in advance. .. Here, the representative value can be a mode value, an average value, or a minimum value. The silhouette extraction unit 203 outputs the extracted silhouette information to the feature amount extraction unit 204.

For example, when a human region is specified as shown in FIG. 2B, the silhouette extraction unit 203 determines the distance possessed by the largest number of pixels in the human region as a representative value. Then, the silhouette extraction unit 203 extracts the silhouette of a person from the distance image as shown in FIG. 2D by specifying the area of the pixel indicating the distance within the range in which the difference from the representative value is within a predetermined value.

The feature amount extraction unit 204 extracts the feature amount from the silhouette extracted by the silhouette extraction unit 203. Further, the information processing apparatus 200 has a collation mode and a registration mode. In the registration mode, the feature amount extraction unit 204 registers (records) the feature amount in the recording unit 207. On the other hand, in the collation mode, the feature amount extraction unit 204 outputs the feature amount to the collation unit 205.

Here, the feature amount extraction unit 204 can also use the silhouette itself extracted by the silhouette extraction unit 203 as the feature amount. Further, in the field of gait authentication, GEI (Gait Energy Image), GEnI (Gait Entropy Image), MGEI (Masked GEI), FDF (Frequency domain Energy), FDF (Frequency domain Features), FDF (Frequency domain Features), CGI (CGI), etc. can.

GEI is an image obtained by averaging consecutive frames. For example, as shown in FIGS. 3A to 3C, when silhouettes are extracted in three frames, the silhouette extraction unit 203 averages the three frames (three silhouettes) to obtain the image shown in FIG. 3D. Extract as GEI. In FIG. 3D, the non-moving (static) area is shown in white, the moving (dynamic) area is shown in gray (between white and black), and the non-human area is black. It is shown by. By using GEI as a feature amount, it becomes possible for the feature amount to indicate the walking width, the swing width of the hand, the walking speed, etc. when a person is walking. That is, if GEI is used as a feature amount, gait authentication can be performed from the captured image (distance image).

Further, for example, GEnI is an image in which a dynamic region is converted to high brightness and a static region is converted to low brightness in GEI. MGEI is an image obtained by extracting only a dynamic part from GEI.

By extracting the silhouette from the distance image in this way, the silhouette can be extracted without being affected by the person or background color that is affected when the silhouette is extracted from the optical image. That is, the silhouette can be extracted with higher accuracy.

The collation unit 205 collates (matches) the feature amount acquired from the feature amount extraction unit 204 with the feature amount registered in the recording unit 207. For example, when the difference between the two feature quantities is within the threshold value, the collating unit 205 determines that the person captured in the distance image and the person imaged for the registered feature quantity are the same person. In other cases, the collating unit 205 determines that the person captured in the distance image and the person imaged for the registered feature amount are not the same person. The collation unit 205 may calculate the probability of the person (probability of being the same person) according to the degree of similarity between the two feature quantities without determining that the person is the same person. The collation unit 205 outputs the collation result to the output unit 206. The determination of whether or not the person is the same person by collation based on the feature amount is not limited to the above-mentioned method, and may be determined by any method.

The output unit 206 notifies the user of the collation result acquired from the collation unit 205. That is, the output unit 206 notifies whether or not the person indicated by the registered feature amount and the person reflected in the distance image are the same, and the probabilities of these two persons. For example, the output unit 206 may display the collation result on the display or notify the collation result by voice. Further, the output unit 206 may output the collation result on paper via a printer.

The recording unit 207 records the feature amount registered by the feature amount extraction unit 204 in the registration mode. When registering the feature amount, the recording unit 207 may record the name, ID, and the like of the person appearing in the distance image of the feature amount in association with the feature amount according to the user input. By associating in this way, the collating unit 205 can identify the person who appears in the distance image that matches the feature amount registered in the recording unit 207. Further, the recording unit 207 may record a program for operating each functional unit. The recording unit 207 includes a ROM (Read-only Memory) for storing important programs as a system, a RAM (Random Access Memory) for storing high-speed access, and an HDD (Hard Disk Drive) for storing a large amount of data. ), Etc., can include a plurality of recording members.

Further, the image pickup device 100 and the information processing device 200 can be configured by, for example, a computer including a CPU (processor), a memory, a storage, and the like. In that case, the configuration shown in FIG. 1 is realized by loading the program stored in the storage into the memory and executing the program by the CPU. Such a computer may be a general-purpose computer such as a personal computer, a server computer, a tablet terminal, or a smartphone, or an embedded computer such as an onboard computer. Alternatively, all or part of the configuration shown in FIG. 1 may be configured by ASIC, FPGA, or the like. Alternatively, all or part of the configuration shown in FIG. 1 may be realized by cloud computing or distributed computing.

[Processing of information processing device]
Hereinafter, the processing of the information processing apparatus 200 will be described with reference to the flowchart of FIG. The flowchart of FIG. 4 is realized by each functional unit executing a program recorded in the recording unit 207. Further, the flowchart of FIG. 4 is started when the imaging device 100 outputs a distance image to the information processing device 200.

In step S1001, the acquisition unit 201 acquires a distance image from the image pickup device 100. The acquisition unit 201 outputs the acquired distance image to the detection unit 202.

In step S1002, the detection unit 202 identifies (detects) a human area from the distance image (detects a person). The detection unit 202 outputs the information of the human area to the silhouette extraction unit 203.

In step S1003, the silhouette extraction unit 203 extracts the silhouette from the human area of the distance image. The silhouette extraction unit 203 outputs silhouette information to the feature amount extraction unit 204.

In step S1004, the feature amount extraction unit 204 extracts the feature amount from the silhouette. When GEI is used as the feature amount, silhouettes for a plurality of frames are required. Therefore, the feature amount extraction unit 204 waits for the plurality of frames of the distance image until the processing of steps S1001 to S1003 is completed. ..

In step S1005, the feature amount extraction unit 204 determines whether the information processing device 200 is in the collation mode or the registration mode. In the collation mode, the feature amount extraction unit 204 outputs the feature amount to the collation unit 205, and the process proceeds to step S1006. In the case of the registration mode, the process proceeds to step S1008.

In step S1006, the collating unit 205 collates the feature amount acquired from the feature amount extracting unit 204 with the feature amount registered in the recording unit 207. The collation unit 205 outputs the collation result to the output unit 206.

In step S1007, the output unit 206 notifies the user of the collation result. The output unit 206 is not limited to notifying the user of the collation result, and may output to, for example, an external device.

In step S1008, the feature amount extraction unit 204 registers (records) the feature amount in the recording unit 207.

As described above, according to the present embodiment, since the distance image is used, the silhouette can be generated (extracted) with high accuracy without being affected by the color of the ornament or the background. Therefore, it is possible to authenticate an individual with high accuracy.

Although the example of extracting the silhouette of a person has been described in the present embodiment, it is possible to extract the silhouette of an arbitrary object such as an animal or an object, not limited to a person.

Further, the processing method performed by the information processing apparatus 200 can be regarded as an information processing method or a silhouette extraction method. Further, the information processing device 200 can be regarded as a silhouette extraction device and a processing device.

It should be noted that the interpretation of the description of the scope of claims is not limited only by the matters described in the embodiment. The interpretation of the description of the claims also includes the range described so that a person skilled in the art can recognize that the problem of the invention can be solved in consideration of the common general technical knowledge at the time of filing.

(Appendix 1)
An acquisition means (201) for acquiring a distance image as a first image, which is a two-dimensional image having information on the distance between an object existing in the imaging range (10) including an object and the imaging device (100) as a pixel value.
An extraction means (203) for extracting the silhouette of the object from the first image, and
(200).

(Appendix 2)
An acquisition step (S1001) of acquiring a distance image as a first image, which is a two-dimensional image having information on the distance between an object existing in the imaging range (10) including the object and the imaging device (100) as a pixel value.
An extraction step (S1003) for extracting the silhouette of the object from the first image, and
A silhouette extraction method characterized by having.

100: Imaging device, 200: Information processing device, 201: Acquisition unit, 202: Detection unit,
203: Silhouette extraction unit, 204: Feature amount extraction unit, 205: Matching unit, 206: Output unit, 207: Recording unit

Claims

An acquisition means for acquiring a distance image, which is a two-dimensional image having information on the distance between an object existing in an imaging range including an object and an imaging device as a pixel value, as a first image.
An extraction means for extracting the silhouette of the object from the first image, and
An information processing device characterized by having.
It further has a detection means for detecting the target body region including the target body from the first image.
The extraction means extracts the silhouette from the object region.
The information processing apparatus according to claim 1.
The extraction means extracts, as the silhouette, a region indicating a distance in which the difference from the representative value of the distance indicated by the object region is within a predetermined value from the object region.
The information processing apparatus according to claim 2.
The representative value is the mode.
The information processing apparatus according to claim 3.
The imaging device further includes a recording means for recording a distance image that does not include the target body as a second image, which captures the same imaging range as when the first image was captured in order to acquire the first image.
The detection means detects the target region based on the difference between the first image and the second image.
The information processing apparatus according to any one of claims 2 to 4.
The imaging device further includes a recording means for recording an optical image that does not include the object as a third image, which captures the same imaging range as when the first image was captured in order to acquire the first image.
The acquisition means acquires an optical image in which the imaging range is captured at the same time as the first image as a fourth image.
The detection means detects the target region based on the difference between the third image and the fourth image.
The information processing apparatus according to any one of claims 2 to 4.
The first image includes at least a first frame and a second frame.
The detection means detects the target region based on the difference between the first frame and the second frame.
The information processing apparatus according to any one of claims 2 to 4.
The detection means acquires the target body region from the first image by using a trained model that has been machine-learned to detect the target body region.
The information processing apparatus according to any one of claims 2 to 4.
Further having a feature amount extraction means for extracting the feature amount in the silhouette.
The information processing apparatus according to any one of claims 1 to 8.
The feature amount extracting means further includes a collating means for collating the extracted feature amount with the feature amount acquired in advance.
The information processing apparatus according to claim 9.
The first image contains a plurality of frames and contains a plurality of frames.
The extraction means extracts a silhouette from each of the plurality of frames.
The feature amount extraction means extracts the feature amount based on a plurality of silhouettes extracted by the feature amount extraction means.
The information processing apparatus according to claim 9 or 10.
The feature amount extracting means extracts an image obtained by averaging a plurality of silhouettes extracted by the feature amount as the feature amount.
The information processing apparatus according to claim 11.
The object is a walking person,
The information processing apparatus according to any one of claims 1 to 12, characterized in that.
An acquisition step of acquiring a distance image, which is a two-dimensional image having information on the distance between an object existing in an imaging range including an object and an imaging device as a pixel value, as a first image.
An extraction step of extracting the silhouette of the object from the first image, and
A silhouette extraction method characterized by having.
A program for causing a computer to execute each step of the silhouette extraction method according to claim 14.