WO2021157214A1 - Information processing device, method for extracting silhouette, and program - Google Patents

Information processing device, method for extracting silhouette, and program Download PDF

Info

Publication number
WO2021157214A1
WO2021157214A1 PCT/JP2020/047061 JP2020047061W WO2021157214A1 WO 2021157214 A1 WO2021157214 A1 WO 2021157214A1 JP 2020047061 W JP2020047061 W JP 2020047061W WO 2021157214 A1 WO2021157214 A1 WO 2021157214A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
information processing
silhouette
feature amount
distance
Prior art date
Application number
PCT/JP2020/047061
Other languages
French (fr)
Japanese (ja)
Inventor
真也 阪田
Original Assignee
オムロン株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by オムロン株式会社 filed Critical オムロン株式会社
Publication of WO2021157214A1 publication Critical patent/WO2021157214A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images

Definitions

  • the present invention relates to an information processing device, a silhouette extraction method, and a program.
  • gait authentication extracts a silhouette (outer shape; contour) of a person from an image of the person and authenticates the person.
  • gait authentication it is possible to authenticate an individual by determining the stride length, the size of the arm swing, the speed of the walking pitch, and the degree of bending of the back based on the silhouette.
  • Patent Document 1 a method of extracting a silhouette by taking a difference between an optical image of a person walking and a background image, and a method of binarizing pixels in an optical image of a person walking are performed. It describes how to extract the silhouette by doing so.
  • an object of the present invention is to provide a technique for extracting a silhouette of an object body more accurately even if the colors of the area of the object body and the background area are similar.
  • the present invention adopts the following configuration.
  • the information processing apparatus acquires a distance image as a first image, which is a two-dimensional image having information on the distance between an object existing in an imaging range including an object and the imaging apparatus as a pixel value. It has an acquisition means for extracting the silhouette of the object from the first image, and an extraction means for extracting the silhouette of the object.
  • the silhouette of the target body is extracted from the distance image that does not depend on the color of the target body or the background, it is possible to extract the silhouette in an appropriate range as compared with the case of extracting the silhouette from the optical image or the like. .. It can be said that the silhouette is the outer shape and contour of the object.
  • the information processing apparatus further includes a detection means for detecting a target body region including the target body from the first image, and the extraction means may extract the silhouette from the target body region. good.
  • the extraction means may extract the silhouette from the target body region after detecting the target body region.
  • a two-step range determination process is performed to extract the silhouette, so that the silhouette can be extracted with higher accuracy.
  • the target body region may have any shape such as a rectangle, a circle, or a polygon as long as it is a region including the target body.
  • the extraction means uses a region indicating a distance within a predetermined value as a silhouette from the target body region so that the difference from the representative value of the distance indicated by the target body region is included within a predetermined value. It may be extracted.
  • the representative value can be the minimum value, the mode value, or the average value. For example, since the object is more likely to be closer to the image pickup device than the background, if the representative value is the minimum value, the silhouette can be extracted with high accuracy.
  • the representative value may be the mode value.
  • the target body region it is considered that the size of the region occupied by the target body is large. Therefore, if the region included in the distance within a predetermined value from the most frequent distance is extracted, the silhouette can be extracted with high accuracy.
  • a distance image not including the object which is the same image pickup range as when the image pickup device was imaged for acquiring the first image, is recorded as a second image.
  • the detection means may further include a recording means, and the detection means may detect the target region based on the difference between the first image and the second image.
  • an optical image not including the object which is the same imaging range as when the imaging apparatus imaged to acquire the first image, is recorded as a third image. Further having a recording means, the acquisition means acquires an optical image in which the imaging range is captured at the same time as the first image as a fourth image, and the detection means includes the third image and the fourth image.
  • the target region may be detected based on the difference. There is a high possibility that there is no difference between the optical image including the object and the optical image not including the object, such as those captured in the same shooting range, except for the region where the object exists. Therefore, according to this configuration, the target body region can be detected with higher accuracy.
  • the first image includes at least a first frame and a second frame
  • the detection means is based on the difference between the first frame and the second frame.
  • the target area may be detected.
  • the target body region can be detected with higher accuracy.
  • the recording capacity can be reduced.
  • the detection means acquires the target body region from the first image by using a learned model in which machine learning for detecting the target body region is performed. good.
  • the information processing device may further have a feature amount extraction means for extracting the feature amount in the silhouette.
  • the information processing apparatus may further include a collating means for collating the feature amount extracted by the feature amount extracting means with the feature amount acquired in advance. Thereby, it is possible to determine whether or not the target body indicated by the feature amount acquired in advance and the target body captured in the first image are the same, the high possibility that they are the same, and the like.
  • the first image includes a plurality of frames
  • the extraction means extracts silhouettes from each of the plurality of frames
  • the feature amount extraction means extracts the extraction.
  • the feature amount may be extracted based on a plurality of silhouettes extracted by the means.
  • the feature amount extracting means may extract an image obtained by averaging a plurality of silhouettes extracted by the extraction means as the feature amount.
  • the object may be a walking person.
  • the feature amount can indicate the walking width, the swing width of the hand, the walking speed, etc. when a person is walking. Therefore, more accurate gait authentication can be realized.
  • the silhouette extraction method acquires a distance image as a first image, which is a two-dimensional image having information on the distance between an object existing in an imaging range including an object and an imaging device as a pixel value. It has a step and an extraction step of extracting the silhouette of the object from the first image.
  • the present invention may be regarded as a control device having at least a part of the above means, or may be regarded as a silhouette extraction device or a silhouette extraction system. Further, the present invention may be regarded as an information processing method including at least a part of the above processing and a control method of an information processing device. Further, the present invention can also be regarded as a program for realizing such a method and a recording medium in which the program is recorded non-temporarily. It should be noted that each of the above means and treatments can be combined with each other as much as possible to form the present invention.
  • the silhouette when extracting the silhouette of the object, the silhouette can be extracted more accurately even if the color of the area of the object and the color of the background area are similar.
  • FIG. 1 is a configuration diagram of an information processing system.
  • FIG. 2A is a diagram showing an optical image including a person.
  • FIG. 2B is a diagram showing a distance image in the imaging range of the optical image shown in FIG. 2A.
  • FIG. 2C is a diagram showing an optical image when no person is included in the imaging range of the optical image shown in FIG. 2A.
  • FIG. 2D is a diagram showing an image showing a silhouette extracted from the distance image shown in FIG. 2B.
  • 3A to 3C are diagrams showing images of each frame of the distance image.
  • FIG. 3D is a diagram showing feature quantities extracted from the images shown in FIGS. 3A to 3C.
  • FIG. 4 is a flowchart showing the processing of the information processing apparatus.
  • the information processing system extracts a silhouette of a person from a distance image which is a two-dimensional image having information on the distance between an object existing in the imaging range 10 including a person (moving object) as an object and the imaging device 100 as a pixel value. .. Unlike optical images, distance images are not affected by color, so it is possible to accurately acquire (extract) human silhouettes.
  • a distance image which is a two-dimensional image having information on the distance between an object existing in the imaging range 10 including a person (moving object) as an object and the imaging device 100 as a pixel value. ..
  • distance images are not affected by color, so it is possible to accurately acquire (extract) human silhouettes.
  • a case where gait authentication is performed using a distance image will be described, and the distance image will be described as a moving image (video) composed of a plurality of frames.
  • the information processing system includes an image pickup device 100 and an information processing device 200.
  • the information processing device 200 may be present inside the image pickup device 100, or the image pickup device 100 may be present inside the information processing device 200.
  • the imaging device 100 acquires a distance image showing the distance between the object existing in the imaging range 10 and the imaging device 100.
  • the image pickup device 100 can acquire the distance between the object existing in the image pickup range 10 and the image pickup device 100 by the parallax of the two optical images taken by the TOF (Time Of Flight) sensor or the stereo camera. can.
  • the imaging device 100 has an infrared light irradiation unit and an imaging unit, the irradiation angle of the light emitted from the irradiation unit, the angle at which the light reflected by the object is inserted into the imaging unit, and the irradiation unit
  • the distance between the object existing in the imaging range 10 and the imaging device 100 may be acquired.
  • the imaging device 100 may be able to acquire an optical image in the imaging range 10 at the same time as acquiring the distance image.
  • the image pickup apparatus 100 outputs the acquired distance image to the information processing apparatus 200.
  • the distance image is an image composed of pixels having pixel values according to the distance. For example, as shown in FIG. 2B, the closer the distance from the image pickup apparatus 100 is, the brighter the pixels are, and the farther the distance is, the darker the pixels are.
  • the distance image may be represented by bright pixels as the distance from the image pickup apparatus 100 increases, and may be represented by dark pixels as the distance from the image pickup apparatus 100 increases. Further, the distance image may be shown in a color (color) according to the distance from the image pickup apparatus 100. Further, the distance image may be an optical image in which each pixel has a numerical value (information) of the distance from the image pickup apparatus 100.
  • the information processing device 200 includes an acquisition unit 201, a detection unit 202, a silhouette extraction unit 203, a feature amount extraction unit 204, a collation unit 205, an output unit 206, and a recording unit 207.
  • the acquisition unit 201 acquires a distance image from the image pickup device 100.
  • the acquisition unit 201 also acquires the optical image from the image pickup apparatus 100.
  • Any communication method may be used for communication between the image pickup apparatus 100 and the acquisition unit 201 (information processing apparatus 200), and either wired communication or wireless communication may be used.
  • the detection unit 202 detects a person in a distance image.
  • detecting a person means identifying (detecting) an area including a person in a distance image.
  • the detection unit 202 does not specify the silhouette (outer shape; contour) of the person, but specifies, for example, a rectangular area including the person. That is, the detection unit 202 identifies (detects) a region wider than the region corresponding to the silhouette of the person as a region including the person.
  • the detection unit 202 outputs information on a region including a person in the distance image to the silhouette extraction unit 203.
  • the region to be specified does not have to be rectangular, and may have any shape such as a circle or a polygon.
  • the detection unit 202 detects the difference (pixels) between the optical image acquired at the same time as the distance image as shown in FIG. 2A and the optical image (background image) obtained by capturing the imaging range 10 excluding people as shown in FIG. 2C. Take values, color differences, etc.). It should be noted that these two images are images obtained by capturing the same imaging range. Then, the detection unit 202 identifies that a person exists in a rectangular range 302 including a region in which the distance difference is equal to or more than a predetermined value in the distance image as shown in FIG. 2B. An optical image of an imaging range 10 that does not include a person as shown in FIG. 2C is acquired in advance by the imaging apparatus 100 and recorded in the recording unit 207. In the following, the area where a person is identified is referred to as a "human area (object area)".
  • the detection unit 202 is a difference (1) between the acquired distance image and a distance image (background distance image) having information on the distance between the object existing in the imaging range 10 excluding people and the imaging device 100 as pixel values. Take pixel values, color differences, etc.). It should be noted that these two images are images obtained by capturing the same imaging range. Then, the detection unit 202 identifies (detects) a rectangular region including a region having a difference of a predetermined value or more as a human region. A distance image showing the distance between an object existing in the imaging range 10 excluding people and the imaging device 100 is acquired in advance by the imaging device 100 and recorded in the recording unit 207.
  • the detection unit 202 takes the difference between the current frame of the acquired distance image and the previous frame, and sets an area including a region having a shorter distance than the previous frame in the current frame as a human region. Specified (detected) as.
  • the detection unit 202 has a learning model (learned model) learned by machine learning such as deep learning, and by inputting a distance image into the trained model, the human region in the distance image is transmitted from the trained model.
  • the trained model can be generated as follows. First, a plurality of combinations of information on the human region in the distance image and the distance image are input to the learning model as teacher data for learning. Then, a trained model is generated from the teacher data based on the algorithm of SVM or the neural network.
  • the silhouette extraction unit 203 extracts a person's silhouette (outer shape; contour) from the person area of the distance image. Specifically, the silhouette extraction unit 203 has a range in which the difference from the representative value of the distance indicated by the human area is within a predetermined value (for example, when the representative value indicates 250 cm, the range is 10 cm before and after. A region indicating a distance included in a certain 240 to 260 cm) is extracted as indicating a silhouette of a person.
  • the predetermined value can be, for example, a size smaller than the thickness of the human body (for example, 50 cm).
  • the silhouette extraction unit 203 outputs the extracted silhouette information to the feature amount extraction unit 204.
  • the silhouette extraction unit 203 determines the distance possessed by the largest number of pixels in the human region as a representative value. Then, the silhouette extraction unit 203 extracts the silhouette of a person from the distance image as shown in FIG. 2D by specifying the area of the pixel indicating the distance within the range in which the difference from the representative value is within a predetermined value.
  • the feature amount extraction unit 204 extracts the feature amount from the silhouette extracted by the silhouette extraction unit 203. Further, the information processing apparatus 200 has a collation mode and a registration mode. In the registration mode, the feature amount extraction unit 204 registers (records) the feature amount in the recording unit 207. On the other hand, in the collation mode, the feature amount extraction unit 204 outputs the feature amount to the collation unit 205.
  • the feature amount extraction unit 204 can also use the silhouette itself extracted by the silhouette extraction unit 203 as the feature amount. Further, in the field of gait authentication, GEI (Gait Energy Image), GEnI (Gait Entropy Image), MGEI (Masked GEI), FDF (Frequency domain Energy), FDF (Frequency domain Features), FDF (Frequency domain Features), CGI (CGI), etc. can.
  • GEI is an image obtained by averaging consecutive frames. For example, as shown in FIGS. 3A to 3C, when silhouettes are extracted in three frames, the silhouette extraction unit 203 averages the three frames (three silhouettes) to obtain the image shown in FIG. 3D. Extract as GEI. In FIG. 3D, the non-moving (static) area is shown in white, the moving (dynamic) area is shown in gray (between white and black), and the non-human area is black. It is shown by.
  • GEI as a feature amount, it becomes possible for the feature amount to indicate the walking width, the swing width of the hand, the walking speed, etc. when a person is walking. That is, if GEI is used as a feature amount, gait authentication can be performed from the captured image (distance image).
  • GEnI is an image in which a dynamic region is converted to high brightness and a static region is converted to low brightness in GEI.
  • MGEI is an image obtained by extracting only a dynamic part from GEI.
  • the silhouette By extracting the silhouette from the distance image in this way, the silhouette can be extracted without being affected by the person or background color that is affected when the silhouette is extracted from the optical image. That is, the silhouette can be extracted with higher accuracy.
  • the collation unit 205 collates (matches) the feature amount acquired from the feature amount extraction unit 204 with the feature amount registered in the recording unit 207. For example, when the difference between the two feature quantities is within the threshold value, the collating unit 205 determines that the person captured in the distance image and the person imaged for the registered feature quantity are the same person. In other cases, the collating unit 205 determines that the person captured in the distance image and the person imaged for the registered feature amount are not the same person. The collation unit 205 may calculate the probability of the person (probability of being the same person) according to the degree of similarity between the two feature quantities without determining that the person is the same person. The collation unit 205 outputs the collation result to the output unit 206. The determination of whether or not the person is the same person by collation based on the feature amount is not limited to the above-mentioned method, and may be determined by any method.
  • the output unit 206 notifies the user of the collation result acquired from the collation unit 205. That is, the output unit 206 notifies whether or not the person indicated by the registered feature amount and the person reflected in the distance image are the same, and the probabilities of these two persons. For example, the output unit 206 may display the collation result on the display or notify the collation result by voice. Further, the output unit 206 may output the collation result on paper via a printer.
  • the recording unit 207 records the feature amount registered by the feature amount extraction unit 204 in the registration mode.
  • the recording unit 207 may record the name, ID, and the like of the person appearing in the distance image of the feature amount in association with the feature amount according to the user input. By associating in this way, the collating unit 205 can identify the person who appears in the distance image that matches the feature amount registered in the recording unit 207. Further, the recording unit 207 may record a program for operating each functional unit.
  • the recording unit 207 includes a ROM (Read-only Memory) for storing important programs as a system, a RAM (Random Access Memory) for storing high-speed access, and an HDD (Hard Disk Drive) for storing a large amount of data. ), Etc., can include a plurality of recording members.
  • the image pickup device 100 and the information processing device 200 can be configured by, for example, a computer including a CPU (processor), a memory, a storage, and the like.
  • a computer including a CPU (processor), a memory, a storage, and the like.
  • the configuration shown in FIG. 1 is realized by loading the program stored in the storage into the memory and executing the program by the CPU.
  • a computer may be a general-purpose computer such as a personal computer, a server computer, a tablet terminal, or a smartphone, or an embedded computer such as an onboard computer.
  • all or part of the configuration shown in FIG. 1 may be configured by ASIC, FPGA, or the like.
  • all or part of the configuration shown in FIG. 1 may be realized by cloud computing or distributed computing.
  • the flowchart of FIG. 4 is realized by each functional unit executing a program recorded in the recording unit 207. Further, the flowchart of FIG. 4 is started when the imaging device 100 outputs a distance image to the information processing device 200.
  • step S1001 the acquisition unit 201 acquires a distance image from the image pickup device 100.
  • the acquisition unit 201 outputs the acquired distance image to the detection unit 202.
  • step S1002 the detection unit 202 identifies (detects) a human area from the distance image (detects a person).
  • the detection unit 202 outputs the information of the human area to the silhouette extraction unit 203.
  • step S1003 the silhouette extraction unit 203 extracts the silhouette from the human area of the distance image.
  • the silhouette extraction unit 203 outputs silhouette information to the feature amount extraction unit 204.
  • step S1004 the feature amount extraction unit 204 extracts the feature amount from the silhouette.
  • GEI is used as the feature amount
  • silhouettes for a plurality of frames are required. Therefore, the feature amount extraction unit 204 waits for the plurality of frames of the distance image until the processing of steps S1001 to S1003 is completed. ..
  • step S1005 the feature amount extraction unit 204 determines whether the information processing device 200 is in the collation mode or the registration mode. In the collation mode, the feature amount extraction unit 204 outputs the feature amount to the collation unit 205, and the process proceeds to step S1006. In the case of the registration mode, the process proceeds to step S1008.
  • step S1006 the collating unit 205 collates the feature amount acquired from the feature amount extracting unit 204 with the feature amount registered in the recording unit 207.
  • the collation unit 205 outputs the collation result to the output unit 206.
  • step S1007 the output unit 206 notifies the user of the collation result.
  • the output unit 206 is not limited to notifying the user of the collation result, and may output to, for example, an external device.
  • step S1008 the feature amount extraction unit 204 registers (records) the feature amount in the recording unit 207.
  • the silhouette can be generated (extracted) with high accuracy without being affected by the color of the ornament or the background. Therefore, it is possible to authenticate an individual with high accuracy.
  • the processing method performed by the information processing apparatus 200 can be regarded as an information processing method or a silhouette extraction method.
  • the information processing device 200 can be regarded as a silhouette extraction device and a processing device.
  • Imaging device 100: Imaging device, 200: Information processing device, 201: Acquisition unit, 202: Detection unit, 203: Silhouette extraction unit, 204: Feature amount extraction unit, 205: Matching unit, 206: Output unit, 207: Recording unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)

Abstract

An information processing device according to the present invention comprises: an acquisition means for acquiring, as a first image, a distance image that is a two-dimensional image having, as pixel values, information of distances between an image pickup device and objects present in an imaging range which includes a target object; and an extraction means for extracting the silhouette of the target body from the first image.

Description

情報処理装置、シルエット抽出方法、プログラムInformation processing device, silhouette extraction method, program
 本発明は、情報処理装置、シルエット抽出方法、プログラムに関する。 The present invention relates to an information processing device, a silhouette extraction method, and a program.
 人の歩行は、その人物によって異なることが知られている。そこで、人を撮像した画像から、その人のシルエット(外形;輪郭)を抽出して認証を行う歩容認証と呼ばれる技術がある。歩容認証では、シルエットに基づき、歩幅、腕の振り方の大小、歩行ピッチの速さや背の曲がり具合を判定して個人の認証を行うことができる。このように、歩容認証を行うことによれば、撮像した画像の解像度が低い場合や顔が写っていない場合にも個人を認証することができる。 It is known that a person's walking differs depending on the person. Therefore, there is a technique called gait authentication that extracts a silhouette (outer shape; contour) of a person from an image of the person and authenticates the person. In gait authentication, it is possible to authenticate an individual by determining the stride length, the size of the arm swing, the speed of the walking pitch, and the degree of bending of the back based on the silhouette. By performing gait authentication in this way, it is possible to authenticate an individual even when the resolution of the captured image is low or when the face is not shown.
 特許文献1では、人の歩行の様子を撮像した光学画像と背景画像との差分を取ることによってシルエットを抽出する方法と、人の歩行の様子を撮像した光学画像において画素の2値化を行うことによってシルエットを抽出する方法が記載されている。 In Patent Document 1, a method of extracting a silhouette by taking a difference between an optical image of a person walking and a background image, and a method of binarizing pixels in an optical image of a person walking are performed. It describes how to extract the silhouette by doing so.
特開平4-33066号公報Japanese Unexamined Patent Publication No. 4-33066
 しかしながら、特許文献1に記載の技術では、対象体である人が装着している服の色と背景色が類似する場合には、適切に差分を取ることや2値化ができず、正確にシルエットが抽出できないことがある。 However, in the technique described in Patent Document 1, when the color of the clothes worn by the target person and the background color are similar, it is not possible to appropriately take a difference or binarize, and accurately. The silhouette may not be extracted.
 そこで、本発明は、対象体のシルエットを抽出する場合に、対象体の領域と背景領域の色が類似していても、より正確にシルエットを抽出することができる技術の提供を目的とする。 Therefore, an object of the present invention is to provide a technique for extracting a silhouette of an object body more accurately even if the colors of the area of the object body and the background area are similar.
 上記目的を達成するために本発明は、以下の構成を採用する。 In order to achieve the above object, the present invention adopts the following configuration.
 すなわち、本発明の一側面に係る情報処理装置は、対象体を含む撮像範囲に存在する物体と撮像装置との距離の情報を画素値として有する2次元画像である距離画像を第1画像として取得する取得手段と、前記第1画像から前記対象体のシルエットを抽出する抽出手段と、を有する。 That is, the information processing apparatus according to one aspect of the present invention acquires a distance image as a first image, which is a two-dimensional image having information on the distance between an object existing in an imaging range including an object and the imaging apparatus as a pixel value. It has an acquisition means for extracting the silhouette of the object from the first image, and an extraction means for extracting the silhouette of the object.
 上記構成によれば、対象体や背景などの色に依存しない距離画像から対象体のシルエットを抽出するため、光学画像などからシルエットを抽出する場合よりも適切な範囲のシルエットを抽出することができる。なお、シルエットとは、対象体の外形、輪郭であるといえる。 According to the above configuration, since the silhouette of the target body is extracted from the distance image that does not depend on the color of the target body or the background, it is possible to extract the silhouette in an appropriate range as compared with the case of extracting the silhouette from the optical image or the like. .. It can be said that the silhouette is the outer shape and contour of the object.
 上記一側面に係る情報処理装置において、前記対象体を含む対象体領域を前記第1画像から検出する検出手段をさらに有し、前記抽出手段は、前記対象体領域から前記シルエットを抽出してもよい。対象体領域を検出した後に、対象体領域からシルエットを抽出することによって、シルエットを抽出するために2段階の範囲決定処理が行われるため、より高精度にシルエットの抽出が可能になる。なお、対象体領域は、対象体が含まれる領域であれば、矩形、円形、多角形など任意の形状であってよい。 The information processing apparatus according to the one aspect further includes a detection means for detecting a target body region including the target body from the first image, and the extraction means may extract the silhouette from the target body region. good. By extracting the silhouette from the target body region after detecting the target body region, a two-step range determination process is performed to extract the silhouette, so that the silhouette can be extracted with higher accuracy. The target body region may have any shape such as a rectangle, a circle, or a polygon as long as it is a region including the target body.
 上記一側面に係る情報処理装置において、前記抽出手段は、前記対象体領域から、前記対象体領域が示す前記距離の代表値との差分が所定値以内に含まれる距離を示す領域を前記シルエットとして抽出してもよい。代表値は、最小値、最頻値、平均値であり得る。例えば、背景よりも対象体が撮像装置に近い可能性が高いため、代表値が最小値であれば、精度高くシルエットを抽出可能である。 In the information processing apparatus according to the one aspect, the extraction means uses a region indicating a distance within a predetermined value as a silhouette from the target body region so that the difference from the representative value of the distance indicated by the target body region is included within a predetermined value. It may be extracted. The representative value can be the minimum value, the mode value, or the average value. For example, since the object is more likely to be closer to the image pickup device than the background, if the representative value is the minimum value, the silhouette can be extracted with high accuracy.
 上記一側面に係る情報処理装置において、前記代表値は、最頻値であってもよい。対象体領域では、対象体が占める領域の大きさが大きいと考えられるため、最頻の距離から所定値以内の距離に含まれる領域が抽出されれば、精度高くシルエットを抽出可能である。 In the information processing device according to the above aspect, the representative value may be the mode value. In the target body region, it is considered that the size of the region occupied by the target body is large. Therefore, if the region included in the distance within a predetermined value from the most frequent distance is extracted, the silhouette can be extracted with high accuracy.
 上記一側面に係る情報処理装置において、前記撮像装置が、前記第1画像を取得するために撮像した際と同じ撮像範囲を撮像した、前記対象体を含まない距離画像を第2画像として記録する記録手段をさらに有し、前記検出手段は、前記第1画像と前記第2画像との差分に基づき、前記対象体領域を検出してもよい。同じ撮影範囲を撮像したような、対象体を含む距離画像と対象体を含まない距離画像とでは、当該対象体の存在する領域以外は差異がない可能性が高い。そのため、本構成によれば、より精度高く、対象体領域を検出することができる。また、光学画像を用いずに、シルエットを抽出することも可能になる。 In the information processing device according to the one aspect, a distance image not including the object, which is the same image pickup range as when the image pickup device was imaged for acquiring the first image, is recorded as a second image. The detection means may further include a recording means, and the detection means may detect the target region based on the difference between the first image and the second image. There is a high possibility that there is no difference between a distance image including an object and a distance image not including the object, such as an image of the same shooting range, except for the area where the object exists. Therefore, according to this configuration, the target body region can be detected with higher accuracy. It also makes it possible to extract silhouettes without using optical images.
 上記一側面に係る情報処理装置において、前記撮像装置が、前記第1画像を取得するために撮像した際と同じ撮像範囲を撮像した、前記対象体を含まない光学画像を第3画像として記録する記録手段をさらに有し、前記取得手段は、前記第1画像と同時に前記撮像範囲を撮像した光学画像を第4画像として取得し、前記検出手段は、前記第3画像と前記第4画像との差分に基づき、前記対象体領域を検出してもよい。同じ撮影範囲を撮像したような、対象体を含む光学画像と対象体を含まない光学画像とでは、当該対象体の存在する領域以外は差異がない可能性が高い。そのため、本構成によれば、より精度高く、対象体領域を検出することができる。 In the information processing apparatus according to the one aspect, an optical image not including the object, which is the same imaging range as when the imaging apparatus imaged to acquire the first image, is recorded as a third image. Further having a recording means, the acquisition means acquires an optical image in which the imaging range is captured at the same time as the first image as a fourth image, and the detection means includes the third image and the fourth image. The target region may be detected based on the difference. There is a high possibility that there is no difference between the optical image including the object and the optical image not including the object, such as those captured in the same shooting range, except for the region where the object exists. Therefore, according to this configuration, the target body region can be detected with higher accuracy.
 上記一側面に係る情報処理装置において、前記第1画像は、少なくとも第1フレームと第2フレームを含んでおり、前記検出手段は、前記第1フレームと前記第2フレームとの差分に基づき、前記対象体領域を検出してもよい。第1フレームと第2フレームとでは、対象体以外の領域に差異がない可能性が高い。そのため、本構成によれば、より精度高く、対象体領域を検出することができる。また、記録手段に、対象体が写っていない距離画像や光学画像などを記録させておく必要がなくなるため、記録容量の削減もできる。 In the information processing apparatus according to the one aspect, the first image includes at least a first frame and a second frame, and the detection means is based on the difference between the first frame and the second frame. The target area may be detected. There is a high possibility that there is no difference in the region other than the target body between the first frame and the second frame. Therefore, according to this configuration, the target body region can be detected with higher accuracy. Further, since it is not necessary to record a distance image, an optical image, or the like in which the object is not captured in the recording means, the recording capacity can be reduced.
 上記一側面に係る情報処理装置において、前記検出手段は、前記対象体領域を検出するための機械学習を行った学習済モデルを用いて、前記第1画像から前記対象体領域を取得してもよい。 In the information processing apparatus according to the one aspect, even if the detection means acquires the target body region from the first image by using a learned model in which machine learning for detecting the target body region is performed. good.
 上記一側面に係る情報処理装置において、前記シルエットにおける特徴量を抽出する特徴量抽出手段をさらに有してもよい。 The information processing device according to the above aspect may further have a feature amount extraction means for extracting the feature amount in the silhouette.
 上記一側面に係る情報処理装置において、前記特徴量抽出手段が前記抽出した特徴量と、予め取得した特徴量とを照合する照合手段をさらに有してもよい。これによって、予め取得した特徴量が示す対象体と、第1画像に写った対象体とが、同一であるか否かや同一である可能性の高さなどを判定することができる。 The information processing apparatus according to the above aspect may further include a collating means for collating the feature amount extracted by the feature amount extracting means with the feature amount acquired in advance. Thereby, it is possible to determine whether or not the target body indicated by the feature amount acquired in advance and the target body captured in the first image are the same, the high possibility that they are the same, and the like.
 上記一側面に係る情報処理装置において、前記第1画像は、複数のフレームを含んでおり、前記抽出手段は、前記複数のフレームのそれぞれからシルエットを抽出し、前記特徴量抽出手段は、前記抽出手段が抽出した複数のシルエットに基づき前記特徴量を抽出してもよい。 In the information processing apparatus according to the one aspect, the first image includes a plurality of frames, the extraction means extracts silhouettes from each of the plurality of frames, and the feature amount extraction means extracts the extraction. The feature amount may be extracted based on a plurality of silhouettes extracted by the means.
 上記一側面に係る情報処理装置において、前記特徴量抽出手段は、前記抽出手段が抽出した複数のシルエットを平均化した画像を前記特徴量として抽出してもよい。前記対象体は、歩行する人であってもよい。このような構成によれば、人の歩行時の歩行幅、手のふり幅、歩行速度などを特徴量が示すことが可能になる。よって、より高精度な歩容認証が実現できる。 In the information processing apparatus according to the one aspect, the feature amount extracting means may extract an image obtained by averaging a plurality of silhouettes extracted by the extraction means as the feature amount. The object may be a walking person. According to such a configuration, the feature amount can indicate the walking width, the swing width of the hand, the walking speed, etc. when a person is walking. Therefore, more accurate gait authentication can be realized.
 本発明の一側面に係るシルエット抽出方法は、対象体を含む撮像範囲に存在する物体と撮像装置との距離の情報を画素値として有する2次元画像である距離画像を第1画像として取得する取得ステップと、前記第1画像から前記対象体のシルエットを抽出する抽出ステップとを有する。 The silhouette extraction method according to one aspect of the present invention acquires a distance image as a first image, which is a two-dimensional image having information on the distance between an object existing in an imaging range including an object and an imaging device as a pixel value. It has a step and an extraction step of extracting the silhouette of the object from the first image.
 本発明は、上記手段の少なくとも一部を有する制御装置として捉えてもよいし、シルエット抽出装置やシルエット抽出システムとして捉えてもよい。また、本発明は、上記処理の少なくとも一部を含む情報処理方法、情報処理装置の制御方法、として捉えてもよい。また、本発明は、かかる方法を実現するためのプログラムやそのプログラムを非一時的に記録した記録媒体として捉えることもできる。なお、上記手段および処理の各々は可能な限り互いに組み合わせて本発明を構成することができる。 The present invention may be regarded as a control device having at least a part of the above means, or may be regarded as a silhouette extraction device or a silhouette extraction system. Further, the present invention may be regarded as an information processing method including at least a part of the above processing and a control method of an information processing device. Further, the present invention can also be regarded as a program for realizing such a method and a recording medium in which the program is recorded non-temporarily. It should be noted that each of the above means and treatments can be combined with each other as much as possible to form the present invention.
 本発明によれば、対象体のシルエットを抽出する場合に、対象体の領域と背景領域の色が類似していても、より正確にシルエットを抽出することができる。 According to the present invention, when extracting the silhouette of the object, the silhouette can be extracted more accurately even if the color of the area of the object and the color of the background area are similar.
図1は、情報処理システムの構成図である。FIG. 1 is a configuration diagram of an information processing system. 図2Aは、人を含む光学画像を示す図である。図2Bは、図2Aが示す光学画像の撮像範囲における距離画像を示す図である。図2Cは、図2Aが示す光学画像の撮像範囲において人が含まれない場合の光学画像を示す図である。図2Dは、図2Bが示す距離画像から抽出したシルエットを示す画像を示す図である。FIG. 2A is a diagram showing an optical image including a person. FIG. 2B is a diagram showing a distance image in the imaging range of the optical image shown in FIG. 2A. FIG. 2C is a diagram showing an optical image when no person is included in the imaging range of the optical image shown in FIG. 2A. FIG. 2D is a diagram showing an image showing a silhouette extracted from the distance image shown in FIG. 2B. 図3A~図3Cは、距離画像の各フレームの画像を示す図である。図3Dは、図3A~図3Cが示す画像から抽出した特徴量を示す図である。3A to 3C are diagrams showing images of each frame of the distance image. FIG. 3D is a diagram showing feature quantities extracted from the images shown in FIGS. 3A to 3C. 図4は、情報処理装置の処理を示すフローチャートである。FIG. 4 is a flowchart showing the processing of the information processing apparatus.
 以下、本発明を実施するための実施形態について図面を用いて記載する。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings.
<適用例>
 以下では、本実施形態に係る情報処理システムについて説明する。情報処理システムは、対象体である人(動体)を含む撮像範囲10に存在する物体と撮像装置100との距離の情報を画素値として有する2次元画像である距離画像から人のシルエットを抽出する。距離画像は光学画像と異なり色の影響を受けないため、精度よく人のシルエットを取得(抽出)することができる。なお、本実施形態では、距離画像を用いて歩容認証を行う場合について説明し、距離画像は複数のフレームから構成される動画(映像)であるものとして説明する。
<Application example>
Hereinafter, the information processing system according to the present embodiment will be described. The information processing system extracts a silhouette of a person from a distance image which is a two-dimensional image having information on the distance between an object existing in the imaging range 10 including a person (moving object) as an object and the imaging device 100 as a pixel value. .. Unlike optical images, distance images are not affected by color, so it is possible to accurately acquire (extract) human silhouettes. In this embodiment, a case where gait authentication is performed using a distance image will be described, and the distance image will be described as a moving image (video) composed of a plurality of frames.
[情報処理システムの構成]
 本実施形態に係る情報処理システムは、図1が示すように、撮像装置100と情報処理装置200を有する。なお、撮像装置100の内部に情報処理装置200が存在していてもよく、情報処理装置200の内部に撮像装置100が存在していてもよい。
[Information processing system configuration]
As shown in FIG. 1, the information processing system according to the present embodiment includes an image pickup device 100 and an information processing device 200. The information processing device 200 may be present inside the image pickup device 100, or the image pickup device 100 may be present inside the information processing device 200.
 撮像装置100は、撮像範囲10を撮像することによって、撮像範囲10に存在する物体と撮像装置100との距離を示す距離画像を取得する。ここで、撮像装置100は、TOF(Time Of Flight)センサや、ステレオカメラにて撮影した2つの光学画像の視差によって、撮像範囲10に存在する物体と撮像装置100との距離を取得することができる。また、撮像装置100が、赤外光の照射部と撮像部を有する場合には、照射部から放たれた光の照射角度と、物体で反射した光が撮像部に差し込む角度と、照射部と撮像部との間の距離とに応じて、撮像範囲10に存在する物体と撮像装置100との距離が取得されてもよい。また、撮像装置100は、距離画像の取得と同時に、撮像範囲10の光学画像を取得することができてもよい。撮像装置100は、取得した距離画像を情報処理装置200に出力する。 By imaging the imaging range 10, the imaging device 100 acquires a distance image showing the distance between the object existing in the imaging range 10 and the imaging device 100. Here, the image pickup device 100 can acquire the distance between the object existing in the image pickup range 10 and the image pickup device 100 by the parallax of the two optical images taken by the TOF (Time Of Flight) sensor or the stereo camera. can. When the imaging device 100 has an infrared light irradiation unit and an imaging unit, the irradiation angle of the light emitted from the irradiation unit, the angle at which the light reflected by the object is inserted into the imaging unit, and the irradiation unit Depending on the distance between the imaging unit, the distance between the object existing in the imaging range 10 and the imaging device 100 may be acquired. Further, the imaging device 100 may be able to acquire an optical image in the imaging range 10 at the same time as acquiring the distance image. The image pickup apparatus 100 outputs the acquired distance image to the information processing apparatus 200.
 なお、距離画像は、距離に応じた画素値を有する画素で構成される画像である。例えば、図2Bが示すように、撮像装置100からの距離が近いほど明るい画素で示され、距離が遠いほど暗い画素で示される。なお、距離画像は、撮像装置100からの距離が遠いほど明るい画素で示され、距離が近いほど暗い画素で示されてもよい。また、距離画像は、撮像装置100からの距離に応じた色(色彩)で示されてもよい。さらには、距離画像は、光学画像において各画素が撮像装置100からの距離の数値(情報)を有する画像であってもよい。 The distance image is an image composed of pixels having pixel values according to the distance. For example, as shown in FIG. 2B, the closer the distance from the image pickup apparatus 100 is, the brighter the pixels are, and the farther the distance is, the darker the pixels are. The distance image may be represented by bright pixels as the distance from the image pickup apparatus 100 increases, and may be represented by dark pixels as the distance from the image pickup apparatus 100 increases. Further, the distance image may be shown in a color (color) according to the distance from the image pickup apparatus 100. Further, the distance image may be an optical image in which each pixel has a numerical value (information) of the distance from the image pickup apparatus 100.
(情報処理装置の構成)
 情報処理装置200は、取得部201、検出部202、シルエット抽出部203、特徴量抽出部204、照合部205、出力部206、記録部207を有する。
(Configuration of information processing device)
The information processing device 200 includes an acquisition unit 201, a detection unit 202, a silhouette extraction unit 203, a feature amount extraction unit 204, a collation unit 205, an output unit 206, and a recording unit 207.
 取得部201は、撮像装置100から距離画像を取得する。ここで、取得部201は、撮像装置100が距離画像と同時に光学画像を取得している場合には、撮像装置100から光学画像も併せて取得する。なお、撮像装置100と取得部201(情報処理装置200)との通信には、任意の通信方式を用いてもよく、有線通信と無線通信のいずれが用いられてもよい。 The acquisition unit 201 acquires a distance image from the image pickup device 100. Here, when the image pickup apparatus 100 acquires the optical image at the same time as the distance image, the acquisition unit 201 also acquires the optical image from the image pickup apparatus 100. Any communication method may be used for communication between the image pickup apparatus 100 and the acquisition unit 201 (information processing apparatus 200), and either wired communication or wireless communication may be used.
 検出部202は、距離画像において人を検出する。ここで、「人を検出する」とは、距離画像において人が含まれる領域を特定(検出)することをいう。検出部202では、人のシルエット(外形;輪郭)を特定するのではなく、例えば、人が含まれる矩形の領域を特定する。つまり、検出部202は、人のシルエットに対応する領域よりも広い領域を、人が含まれる領域として特定(検出)する。検出部202は、距離画像において人が含まれる領域の情報をシルエット抽出部203に出力する。なお、特定する領域は、矩形である必要はなく、円形、多角形など任意の形状であってよい。 The detection unit 202 detects a person in a distance image. Here, "detecting a person" means identifying (detecting) an area including a person in a distance image. The detection unit 202 does not specify the silhouette (outer shape; contour) of the person, but specifies, for example, a rectangular area including the person. That is, the detection unit 202 identifies (detects) a region wider than the region corresponding to the silhouette of the person as a region including the person. The detection unit 202 outputs information on a region including a person in the distance image to the silhouette extraction unit 203. The region to be specified does not have to be rectangular, and may have any shape such as a circle or a polygon.
 例えば、検出部202は、図2Aが示すような距離画像と同時に取得した光学画像と、図2Cが示すような人を含まない撮像範囲10を撮像した光学画像(背景画像)との差分(画素値や色の差分など)を取る。なお、この2つの画像は、同一の撮像範囲を撮像した画像である。そして、検出部202は、図2Bが示すような距離画像において、距離の差分が所定値以上ある領域を含む矩形の範囲302に人が存在すると特定する。なお、図2Cが示すような人を含まない撮像範囲10を撮像した光学画像は、予め撮像装置100が取得して記録部207に記録されている。以下では、人が存在すると特定された領域を「人領域(対象体領域)」と呼ぶ。 For example, the detection unit 202 detects the difference (pixels) between the optical image acquired at the same time as the distance image as shown in FIG. 2A and the optical image (background image) obtained by capturing the imaging range 10 excluding people as shown in FIG. 2C. Take values, color differences, etc.). It should be noted that these two images are images obtained by capturing the same imaging range. Then, the detection unit 202 identifies that a person exists in a rectangular range 302 including a region in which the distance difference is equal to or more than a predetermined value in the distance image as shown in FIG. 2B. An optical image of an imaging range 10 that does not include a person as shown in FIG. 2C is acquired in advance by the imaging apparatus 100 and recorded in the recording unit 207. In the following, the area where a person is identified is referred to as a "human area (object area)".
 なお、距離画像において人を検出する方法には、例えば、下記の(1)~(3)の例を用いてもよい。 As a method for detecting a person in a distance image, for example, the following examples (1) to (3) may be used.
 (1)検出部202は、取得した距離画像と、人を含まない撮像範囲10に存在する物体と撮像装置100との距離の情報を画素値として有する距離画像(背景距離画像)との差分(画素値や色の差分など)を取る。なお、この2つの画像は、同一の撮像範囲を撮像した画像である。そして、検出部202は、差分が所定値以上ある領域を含む矩形の領域を人領域として特定(検出)する。なお、人を含まない撮像範囲10に存在する物体と撮像装置100との距離を示す距離画像は、予め撮像装置100が取得して記録部207に記録されている。 (1) The detection unit 202 is a difference (1) between the acquired distance image and a distance image (background distance image) having information on the distance between the object existing in the imaging range 10 excluding people and the imaging device 100 as pixel values. Take pixel values, color differences, etc.). It should be noted that these two images are images obtained by capturing the same imaging range. Then, the detection unit 202 identifies (detects) a rectangular region including a region having a difference of a predetermined value or more as a human region. A distance image showing the distance between an object existing in the imaging range 10 excluding people and the imaging device 100 is acquired in advance by the imaging device 100 and recorded in the recording unit 207.
 (2)検出部202は、取得した距離画像の現在のフレームと、1つ前のフレームとの差分を取り、現在のフレームにおいて1つ前のフレームよりも距離が短い領域を含む領域を人領域として特定(検出)する。 (2) The detection unit 202 takes the difference between the current frame of the acquired distance image and the previous frame, and sets an area including a region having a shorter distance than the previous frame in the current frame as a human region. Specified (detected) as.
 (3)検出部202は、ディープラーニングなどの機械学習により学習した学習モデル(学習済モデル)を有し、距離画像を学習済モデルに入力することによって、学習済モデルから、距離画像における人領域を取得する。なお、学習済モデルは、以下のように生成することができる。まず、距離画像における人領域の情報と距離画像との組み合わせが複数、学習するための教師データとして学習モデルに入力される。そして、SVMやニューラルネットワークのアルゴリズムに基づき、教師データから学習済モデルが生成される。 (3) The detection unit 202 has a learning model (learned model) learned by machine learning such as deep learning, and by inputting a distance image into the trained model, the human region in the distance image is transmitted from the trained model. To get. The trained model can be generated as follows. First, a plurality of combinations of information on the human region in the distance image and the distance image are input to the learning model as teacher data for learning. Then, a trained model is generated from the teacher data based on the algorithm of SVM or the neural network.
 シルエット抽出部203は、距離画像の人領域から人のシルエット(外形;輪郭)を抽出する。具体的には、シルエット抽出部203は、人領域から、人領域が示す距離の代表値との差分が所定値以内の範囲(例えば、代表値が250cmを示す場合には、前後10cmの範囲である240~260cm)に含まれる距離を示す領域を人のシルエットを示すものとして抽出する。ここで、所定値とは、例えば、人の身体の厚み(例えば、50cm)よりも小さいサイズであり得る。なお、距離画像の人領域から人のシルエットを抽出可能であれば、例えば予め記憶させておいた人型の形状との類似度に基づきシルエットを抽出するなど、任意の方法が用いられてもよい。ここで、代表値は、最頻値、平均値、最小値であり得る。シルエット抽出部203は、抽出したシルエットの情報を特徴量抽出部204に出力する。 The silhouette extraction unit 203 extracts a person's silhouette (outer shape; contour) from the person area of the distance image. Specifically, the silhouette extraction unit 203 has a range in which the difference from the representative value of the distance indicated by the human area is within a predetermined value (for example, when the representative value indicates 250 cm, the range is 10 cm before and after. A region indicating a distance included in a certain 240 to 260 cm) is extracted as indicating a silhouette of a person. Here, the predetermined value can be, for example, a size smaller than the thickness of the human body (for example, 50 cm). If the silhouette of a person can be extracted from the human area of the distance image, an arbitrary method may be used, for example, extracting the silhouette based on the similarity with the shape of the humanoid stored in advance. .. Here, the representative value can be a mode value, an average value, or a minimum value. The silhouette extraction unit 203 outputs the extracted silhouette information to the feature amount extraction unit 204.
 例えば、図2Bが示すように人領域が特定された場合に、シルエット抽出部203は、人領域において最も多くの画素が有する距離を代表値に決定する。そして、シルエット抽出部203は、代表値との差分が所定値以内の範囲に含まれる距離を示す画素の領域を特定することで、図2Dが示すように距離画像から人のシルエットを抽出する。 For example, when a human region is specified as shown in FIG. 2B, the silhouette extraction unit 203 determines the distance possessed by the largest number of pixels in the human region as a representative value. Then, the silhouette extraction unit 203 extracts the silhouette of a person from the distance image as shown in FIG. 2D by specifying the area of the pixel indicating the distance within the range in which the difference from the representative value is within a predetermined value.
 特徴量抽出部204は、シルエット抽出部203が抽出したシルエットから特徴量を抽出する。また、情報処理装置200は照合モードと登録モードとを有しており、登録モードの場合には、特徴量抽出部204は記録部207に特徴量を登録(記録)する。一方、照合モードの場合には、特徴量抽出部204は照合部205に特徴量を出力する。 The feature amount extraction unit 204 extracts the feature amount from the silhouette extracted by the silhouette extraction unit 203. Further, the information processing apparatus 200 has a collation mode and a registration mode. In the registration mode, the feature amount extraction unit 204 registers (records) the feature amount in the recording unit 207. On the other hand, in the collation mode, the feature amount extraction unit 204 outputs the feature amount to the collation unit 205.
 ここで、特徴量抽出部204は、シルエット抽出部203が抽出したシルエット自体を特徴量とすることもできる。また、歩容認証の分野では、GEI(Gait Energy Image)、GEnI(Gait Entropy Image)、MGEI(Masked GEI)、FDF(Frequency domain features)、CGI(Chrono Gait Image)などを特徴量として用いることができる。 Here, the feature amount extraction unit 204 can also use the silhouette itself extracted by the silhouette extraction unit 203 as the feature amount. Further, in the field of gait authentication, GEI (Gait Energy Image), GEnI (Gait Entropy Image), MGEI (Masked GEI), FDF (Frequency domain Energy), FDF (Frequency domain Features), FDF (Frequency domain Features), CGI (CGI), etc. can.
 GEIは、連続するフレームを平均化した画像である。例えば、図3A~図3Cのように、3つのフレームでシルエットが抽出されている場合には、シルエット抽出部203は、3つのフレーム(3つのシルエット)を平均して、図3Dが示す画像をGEIとして抽出する。図3Dでは、人の動きのない(静的な)領域を白で示し、人の動きのある(動的な)領域を灰色(白と黒の中間)で示し、人が存在しない領域を黒で示している。GEIを特徴量として用いることによれば、人の歩行時の歩行幅、手のふり幅、歩行速度などを特徴量が示すことが可能になる。つまり、GEIを特徴量として用いれば、撮像した画像(距離画像)から歩容認証が可能になる。 GEI is an image obtained by averaging consecutive frames. For example, as shown in FIGS. 3A to 3C, when silhouettes are extracted in three frames, the silhouette extraction unit 203 averages the three frames (three silhouettes) to obtain the image shown in FIG. 3D. Extract as GEI. In FIG. 3D, the non-moving (static) area is shown in white, the moving (dynamic) area is shown in gray (between white and black), and the non-human area is black. It is shown by. By using GEI as a feature amount, it becomes possible for the feature amount to indicate the walking width, the swing width of the hand, the walking speed, etc. when a person is walking. That is, if GEI is used as a feature amount, gait authentication can be performed from the captured image (distance image).
 また、例えば、GEnIは、GEIにおいて動的な領域を高輝度に変換し、静的な領域を低輝度に変換した画像である。MGEIは、GEIから動的な部分のみを抽出した画像である。 Further, for example, GEnI is an image in which a dynamic region is converted to high brightness and a static region is converted to low brightness in GEI. MGEI is an image obtained by extracting only a dynamic part from GEI.
 このように、距離画像からシルエットを抽出することによれば、光学画像からシルエットを抽出する場合に影響を受けるような人や背景の色に影響されずにシルエットが抽出できる。つまり、より高精度に、シルエットの抽出をすることができる。 By extracting the silhouette from the distance image in this way, the silhouette can be extracted without being affected by the person or background color that is affected when the silhouette is extracted from the optical image. That is, the silhouette can be extracted with higher accuracy.
 照合部205は、特徴量抽出部204から取得した特徴量と、記録部207に登録された特徴量とを照合(マッチング)する。照合部205は、例えば、2つの特徴量の差異が閾値以内である場合に、距離画像に写る人と登録された特徴量について撮像された人物とが同一人であると判定する。それ以外の場合には、照合部205は、距離画像に写る人と登録された特徴量について撮像された人物とが同一人でないと判定する。なお、照合部205は、同一人と判定せずに、2つの特徴量の類似度に応じた本人確率(同一人である確率)を算出してもよい。照合部205は、照合結果を出力部206に出力する。なお、特徴量に基づく照合による同一人か否かの判定は、上述の方法に限らず任意の方法によって判定されてよい。 The collation unit 205 collates (matches) the feature amount acquired from the feature amount extraction unit 204 with the feature amount registered in the recording unit 207. For example, when the difference between the two feature quantities is within the threshold value, the collating unit 205 determines that the person captured in the distance image and the person imaged for the registered feature quantity are the same person. In other cases, the collating unit 205 determines that the person captured in the distance image and the person imaged for the registered feature amount are not the same person. The collation unit 205 may calculate the probability of the person (probability of being the same person) according to the degree of similarity between the two feature quantities without determining that the person is the same person. The collation unit 205 outputs the collation result to the output unit 206. The determination of whether or not the person is the same person by collation based on the feature amount is not limited to the above-mentioned method, and may be determined by any method.
 出力部206は、照合部205から取得した照合結果をユーザに通知する。つまり、出力部206は、登録された特徴量が示す人と距離画像に写る人とが同一であったか否かや、この2人の本人確率を通知する。なお、例えば、出力部206は、ディスプレイに照合結果を表示してもよいし、音声によって照合結果を通知してもよい。また、出力部206は、プリンタを介して、紙に照合結果を出力してもよい。 The output unit 206 notifies the user of the collation result acquired from the collation unit 205. That is, the output unit 206 notifies whether or not the person indicated by the registered feature amount and the person reflected in the distance image are the same, and the probabilities of these two persons. For example, the output unit 206 may display the collation result on the display or notify the collation result by voice. Further, the output unit 206 may output the collation result on paper via a printer.
 記録部207は、登録モードの場合に特徴量抽出部204が登録した特徴量を記録する。なお、記録部207は、特徴量の登録の際に、当該特徴量の距離画像に写る人の名称やIDなどをユーザ入力に応じて、当該特徴量と対応付けて記録してもよい。このように対応付けておくことによれば、照合部205は、記録部207に登録された特徴量とマッチングした距離画像に写った人を特定することができる。また、記録部207は、各機能部が動作するためのプログラムを記録してもよい。なお、記録部207は、システムとして重要なプログラムを記憶するROM(Read-only Memory)、高速アクセスを可能とする記憶するRAM(Ramdom Access Memory)、大きな容量のデータを記憶するHDD(Hard Disk Drive)などの複数の記録部材を含むことができる。 The recording unit 207 records the feature amount registered by the feature amount extraction unit 204 in the registration mode. When registering the feature amount, the recording unit 207 may record the name, ID, and the like of the person appearing in the distance image of the feature amount in association with the feature amount according to the user input. By associating in this way, the collating unit 205 can identify the person who appears in the distance image that matches the feature amount registered in the recording unit 207. Further, the recording unit 207 may record a program for operating each functional unit. The recording unit 207 includes a ROM (Read-only Memory) for storing important programs as a system, a RAM (Random Access Memory) for storing high-speed access, and an HDD (Hard Disk Drive) for storing a large amount of data. ), Etc., can include a plurality of recording members.
 また、撮像装置100や情報処理装置200は、例えば、CPU(プロセッサ)、メモリ、ストレージなどを備えるコンピュータにより構成することができる。その場合、図1に示す構成は、ストレージに格納されたプログラムをメモリにロードし、CPUが当該プログラムを実行することによって実現されるものである。かかるコンピュータは、パーソナルコンピュータ、サーバコンピュータ、タブレット端末、スマートフォンのような汎用的なコンピュータでもよいし、オンボードコンピュータのように組み込み型のコンピュータでもよい。あるいは、図1に示す構成の全部または一部を、ASICやFPGAなどで構成してもよい。あるいは、図1に示す構成の全部または一部を、クラウドコンピューティングや分散コンピューティングにより実現してもよい。 Further, the image pickup device 100 and the information processing device 200 can be configured by, for example, a computer including a CPU (processor), a memory, a storage, and the like. In that case, the configuration shown in FIG. 1 is realized by loading the program stored in the storage into the memory and executing the program by the CPU. Such a computer may be a general-purpose computer such as a personal computer, a server computer, a tablet terminal, or a smartphone, or an embedded computer such as an onboard computer. Alternatively, all or part of the configuration shown in FIG. 1 may be configured by ASIC, FPGA, or the like. Alternatively, all or part of the configuration shown in FIG. 1 may be realized by cloud computing or distributed computing.
[情報処理装置の処理]
 以下では、情報処理装置200の処理について図4のフローチャートを用いて説明する。図4のフローチャートは、各機能部が記録部207に記録されたプログラムを実行することによって実現される。また、図4のフローチャートは、撮像装置100が情報処理装置200に距離画像を出力すると開始される。
[Processing of information processing device]
Hereinafter, the processing of the information processing apparatus 200 will be described with reference to the flowchart of FIG. The flowchart of FIG. 4 is realized by each functional unit executing a program recorded in the recording unit 207. Further, the flowchart of FIG. 4 is started when the imaging device 100 outputs a distance image to the information processing device 200.
 ステップS1001において、取得部201は、撮像装置100から距離画像を取得する。取得部201は、取得した距離画像を検出部202に出力する。 In step S1001, the acquisition unit 201 acquires a distance image from the image pickup device 100. The acquisition unit 201 outputs the acquired distance image to the detection unit 202.
 ステップS1002において、検出部202は、距離画像から人領域を特定(検出)する(人を検出する)。検出部202は、人領域の情報をシルエット抽出部203に出力する。 In step S1002, the detection unit 202 identifies (detects) a human area from the distance image (detects a person). The detection unit 202 outputs the information of the human area to the silhouette extraction unit 203.
 ステップS1003において、シルエット抽出部203は、距離画像の人領域からシルエットを抽出する。シルエット抽出部203は、シルエットの情報を特徴量抽出部204に出力する。 In step S1003, the silhouette extraction unit 203 extracts the silhouette from the human area of the distance image. The silhouette extraction unit 203 outputs silhouette information to the feature amount extraction unit 204.
 ステップS1004において、特徴量抽出部204は、シルエットから特徴量を抽出する。なお、特徴量としてGEIを用いる場合には、複数のフレームについてのシルエットが必要になるため、特徴量抽出部204は、距離画像の複数のフレームについてステップS1001~S1003の処理が終了するまで待機する。 In step S1004, the feature amount extraction unit 204 extracts the feature amount from the silhouette. When GEI is used as the feature amount, silhouettes for a plurality of frames are required. Therefore, the feature amount extraction unit 204 waits for the plurality of frames of the distance image until the processing of steps S1001 to S1003 is completed. ..
 ステップS1005において、特徴量抽出部204は、情報処理装置200が照合モードと登録モードのいずれかであるかを判定する。照合モードの場合には、特徴量抽出部204は特徴量を照合部205に出力し、処理がステップS1006に進む。登録モードの場合には、処理がステップS1008に進む。 In step S1005, the feature amount extraction unit 204 determines whether the information processing device 200 is in the collation mode or the registration mode. In the collation mode, the feature amount extraction unit 204 outputs the feature amount to the collation unit 205, and the process proceeds to step S1006. In the case of the registration mode, the process proceeds to step S1008.
 ステップS1006において、照合部205は、特徴量抽出部204から取得した特徴量と、記録部207に登録された特徴量とを照合する。照合部205は、照合結果を出力部206に出力する。 In step S1006, the collating unit 205 collates the feature amount acquired from the feature amount extracting unit 204 with the feature amount registered in the recording unit 207. The collation unit 205 outputs the collation result to the output unit 206.
 ステップS1007において、出力部206は、照合結果をユーザに通知する。なお、出力部206は、照合結果をユーザに通知することに限らず、例えば、外部装置に出力してもよい。 In step S1007, the output unit 206 notifies the user of the collation result. The output unit 206 is not limited to notifying the user of the collation result, and may output to, for example, an external device.
 ステップS1008において、特徴量抽出部204は、記録部207に特徴量を登録(記録)する。 In step S1008, the feature amount extraction unit 204 registers (records) the feature amount in the recording unit 207.
 このように、本実施形態によれば、距離画像を用いるため、装飾品や背景の色の影響を受けずに精度高くシルエットを生成(抽出)することができる。従って、精度よく個人の認証をすることができる。 As described above, according to the present embodiment, since the distance image is used, the silhouette can be generated (extracted) with high accuracy without being affected by the color of the ornament or the background. Therefore, it is possible to authenticate an individual with high accuracy.
 なお、本実施形態では、人のシルエットを抽出する例を説明したが、人に限らず、動物や物などの任意の対象体のシルエットを抽出することができる。 Although the example of extracting the silhouette of a person has been described in the present embodiment, it is possible to extract the silhouette of an arbitrary object such as an animal or an object, not limited to a person.
 また、情報処理装置200が実施した処理の方法は、情報処理方法やシルエット抽出方法と捉えることもできる。また、情報処理装置200は、シルエット抽出装置、処理装置と捉えることもできる。 Further, the processing method performed by the information processing apparatus 200 can be regarded as an information processing method or a silhouette extraction method. Further, the information processing device 200 can be regarded as a silhouette extraction device and a processing device.
 なお、実施形態に記載された事項のみによって特許請求の範囲の記載の解釈が限定されるものではない。特許請求の範囲の記載の解釈には、出願時の技術常識を考慮した、発明の課題が解決できることを当業者が認識できるように記載された範囲も含む。 It should be noted that the interpretation of the description of the scope of claims is not limited only by the matters described in the embodiment. The interpretation of the description of the claims also includes the range described so that a person skilled in the art can recognize that the problem of the invention can be solved in consideration of the common general technical knowledge at the time of filing.
 (付記1)
 対象体を含む撮像範囲(10)に存在する物体と撮像装置(100)との距離の情報を画素値として有する2次元画像である距離画像を第1画像として取得する取得手段(201)と、
 前記第1画像から前記対象体のシルエットを抽出する抽出手段(203)と、
 を有することを特徴とする情報処理装置(200)。
(Appendix 1)
An acquisition means (201) for acquiring a distance image as a first image, which is a two-dimensional image having information on the distance between an object existing in the imaging range (10) including an object and the imaging device (100) as a pixel value.
An extraction means (203) for extracting the silhouette of the object from the first image, and
(200).
 (付記2)
 対象体を含む撮像範囲(10)に存在する物体と撮像装置(100)との距離の情報を画素値として有する2次元画像である距離画像を第1画像として取得する取得ステップ(S1001)と、
 前記第1画像から前記対象体のシルエットを抽出する抽出ステップ(S1003)と、
 を有することを特徴とするシルエット抽出方法。
(Appendix 2)
An acquisition step (S1001) of acquiring a distance image as a first image, which is a two-dimensional image having information on the distance between an object existing in the imaging range (10) including the object and the imaging device (100) as a pixel value.
An extraction step (S1003) for extracting the silhouette of the object from the first image, and
A silhouette extraction method characterized by having.
100:撮像装置、200:情報処理装置、201:取得部、202:検出部、
203:シルエット抽出部、204:特徴量抽出部、205:照合部、206:出力部、207:記録部
100: Imaging device, 200: Information processing device, 201: Acquisition unit, 202: Detection unit,
203: Silhouette extraction unit, 204: Feature amount extraction unit, 205: Matching unit, 206: Output unit, 207: Recording unit

Claims (15)

  1.  対象体を含む撮像範囲に存在する物体と撮像装置との距離の情報を画素値として有する2次元画像である距離画像を第1画像として取得する取得手段と、
     前記第1画像から前記対象体のシルエットを抽出する抽出手段と、
     を有することを特徴とする情報処理装置。
    An acquisition means for acquiring a distance image, which is a two-dimensional image having information on the distance between an object existing in an imaging range including an object and an imaging device as a pixel value, as a first image.
    An extraction means for extracting the silhouette of the object from the first image, and
    An information processing device characterized by having.
  2.  前記対象体を含む対象体領域を前記第1画像から検出する検出手段をさらに有し、
     前記抽出手段は、前記対象体領域から前記シルエットを抽出する、
     ことを特徴とする請求項1に記載の情報処理装置。
    It further has a detection means for detecting the target body region including the target body from the first image.
    The extraction means extracts the silhouette from the object region.
    The information processing apparatus according to claim 1.
  3.  前記抽出手段は、前記対象体領域から、前記対象体領域が示す前記距離の代表値との差分が所定値以内に含まれる距離を示す領域を前記シルエットとして抽出する、
     ことを特徴とする請求項2に記載の情報処理装置。
    The extraction means extracts, as the silhouette, a region indicating a distance in which the difference from the representative value of the distance indicated by the object region is within a predetermined value from the object region.
    The information processing apparatus according to claim 2.
  4.  前記代表値は、最頻値である、
     ことを特徴とする請求項3に記載の情報処理装置。
    The representative value is the mode.
    The information processing apparatus according to claim 3.
  5.  前記撮像装置が、前記第1画像を取得するために撮像した際と同じ撮像範囲を撮像した、前記対象体を含まない距離画像を第2画像として記録する記録手段をさらに有し、
     前記検出手段は、前記第1画像と前記第2画像との差分に基づき、前記対象体領域を検出する、
     ことを特徴とする請求項2から4のいずれか1項に記載の情報処理装置。
    The imaging device further includes a recording means for recording a distance image that does not include the target body as a second image, which captures the same imaging range as when the first image was captured in order to acquire the first image.
    The detection means detects the target region based on the difference between the first image and the second image.
    The information processing apparatus according to any one of claims 2 to 4.
  6.  前記撮像装置が、前記第1画像を取得するために撮像した際と同じ撮像範囲を撮像した、前記対象体を含まない光学画像を第3画像として記録する記録手段をさらに有し、
     前記取得手段は、前記第1画像と同時に前記撮像範囲を撮像した光学画像を第4画像として取得し、
     前記検出手段は、前記第3画像と前記第4画像との差分に基づき、前記対象体領域を検出する、
     ことを特徴とする請求項2から4のいずれか1項に記載の情報処理装置。
    The imaging device further includes a recording means for recording an optical image that does not include the object as a third image, which captures the same imaging range as when the first image was captured in order to acquire the first image.
    The acquisition means acquires an optical image in which the imaging range is captured at the same time as the first image as a fourth image.
    The detection means detects the target region based on the difference between the third image and the fourth image.
    The information processing apparatus according to any one of claims 2 to 4.
  7.  前記第1画像は、少なくとも第1フレームと第2フレームを含んでおり、
     前記検出手段は、前記第1フレームと前記第2フレームとの差分に基づき、前記対象体領域を検出する、
     ことを特徴とする請求項2から4のいずれか1項に記載の情報処理装置。
    The first image includes at least a first frame and a second frame.
    The detection means detects the target region based on the difference between the first frame and the second frame.
    The information processing apparatus according to any one of claims 2 to 4.
  8.  前記検出手段は、前記対象体領域を検出するための機械学習を行った学習済モデルを用いて、前記第1画像から前記対象体領域を取得する、
     ことを特徴とする請求項2から4のいずれか1項に記載の情報処理装置。
    The detection means acquires the target body region from the first image by using a trained model that has been machine-learned to detect the target body region.
    The information processing apparatus according to any one of claims 2 to 4.
  9.  前記シルエットにおける特徴量を抽出する特徴量抽出手段をさらに有する、
     ことを特徴とする請求項1から8のいずれか1項に記載の情報処理装置。
    Further having a feature amount extraction means for extracting the feature amount in the silhouette.
    The information processing apparatus according to any one of claims 1 to 8.
  10.  前記特徴量抽出手段が前記抽出した特徴量と、予め取得した特徴量とを照合する照合手段をさらに有する、
     ことを特徴とする請求項9に記載の情報処理装置。
    The feature amount extracting means further includes a collating means for collating the extracted feature amount with the feature amount acquired in advance.
    The information processing apparatus according to claim 9.
  11.  前記第1画像は、複数のフレームを含んでおり、
     前記抽出手段は、前記複数のフレームのそれぞれからシルエットを抽出し、
     前記特徴量抽出手段は、前記抽出手段が抽出した複数のシルエットに基づき前記特徴量を抽出する、
     ことを特徴とする請求項9または10に記載の情報処理装置。
    The first image contains a plurality of frames and contains a plurality of frames.
    The extraction means extracts a silhouette from each of the plurality of frames.
    The feature amount extraction means extracts the feature amount based on a plurality of silhouettes extracted by the feature amount extraction means.
    The information processing apparatus according to claim 9 or 10.
  12.  前記特徴量抽出手段は、前記抽出手段が抽出した複数のシルエットを平均化した画像を前記特徴量として抽出する、
     ことを特徴とする請求項11に記載の情報処理装置。
    The feature amount extracting means extracts an image obtained by averaging a plurality of silhouettes extracted by the feature amount as the feature amount.
    The information processing apparatus according to claim 11.
  13.  前記対象体は、歩行する人である、
     ことを特徴とする請求項1から12のいずれか1項に記載の情報処理装置。
    The object is a walking person,
    The information processing apparatus according to any one of claims 1 to 12, characterized in that.
  14.  対象体を含む撮像範囲に存在する物体と撮像装置との距離の情報を画素値として有する2次元画像である距離画像を第1画像として取得する取得ステップと、
     前記第1画像から前記対象体のシルエットを抽出する抽出ステップと、
     を有することを特徴とするシルエット抽出方法。
    An acquisition step of acquiring a distance image, which is a two-dimensional image having information on the distance between an object existing in an imaging range including an object and an imaging device as a pixel value, as a first image.
    An extraction step of extracting the silhouette of the object from the first image, and
    A silhouette extraction method characterized by having.
  15.  請求項14に記載のシルエット抽出方法の各ステップをコンピュータに実行させるためのプログラム。 A program for causing a computer to execute each step of the silhouette extraction method according to claim 14.
PCT/JP2020/047061 2020-02-04 2020-12-16 Information processing device, method for extracting silhouette, and program WO2021157214A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020-016931 2020-02-04
JP2020016931A JP2021124868A (en) 2020-02-04 2020-02-04 Information processing device, silhouette extraction method, program

Publications (1)

Publication Number Publication Date
WO2021157214A1 true WO2021157214A1 (en) 2021-08-12

Family

ID=77199210

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/047061 WO2021157214A1 (en) 2020-02-04 2020-12-16 Information processing device, method for extracting silhouette, and program

Country Status (2)

Country Link
JP (1) JP2021124868A (en)
WO (1) WO2021157214A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023175652A1 (en) * 2022-03-14 2023-09-21 日本電気株式会社 Moving image generating device, moving image generating method, and moving image generating program

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017205135A (en) * 2014-08-25 2017-11-24 ノーリツプレシジョン株式会社 Individual identification device, individual identification method, and individual identification program
JP2018124890A (en) * 2017-02-03 2018-08-09 日本電信電話株式会社 Image processing apparatus, image processing method, and image processing program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017205135A (en) * 2014-08-25 2017-11-24 ノーリツプレシジョン株式会社 Individual identification device, individual identification method, and individual identification program
JP2018124890A (en) * 2017-02-03 2018-08-09 日本電信電話株式会社 Image processing apparatus, image processing method, and image processing program

Also Published As

Publication number Publication date
JP2021124868A (en) 2021-08-30

Similar Documents

Publication Publication Date Title
Priesnitz et al. An overview of touchless 2D fingerprint recognition
Patel et al. Secure face unlock: Spoof detection on smartphones
Patel et al. Live face video vs. spoof face video: Use of moiré patterns to detect replay video attacks
US8406484B2 (en) Facial recognition apparatus, method and computer-readable medium
KR102466997B1 (en) Liveness test method and apparatus
US11580775B2 (en) Differentiating between live and spoof fingers in fingerprint analysis by machine learning
KR101596298B1 (en) Contactless fingerprint image acquistion method using smartphone
JP2018032391A (en) Liveness test method and apparatus
Chugh et al. Fingerprint spoof detection using minutiae-based local patches
KR20180022677A (en) Device and computer implementation method for fingerprint-based authentication
De Marsico et al. Insights into the results of miche i-mobile iris challenge evaluation
JP2015529365A5 (en)
US11227149B2 (en) Method and apparatus with liveness detection and object recognition
US8264327B2 (en) Authentication apparatus, image sensing apparatus, authentication method and program therefor
Johnson et al. Fingerprint pore characteristics for liveness detection
Diwakar et al. An extraction and recognition of tongue-print images for biometrics authentication system
CN107346419B (en) Iris recognition method, electronic device, and computer-readable storage medium
JP7151875B2 (en) Image processing device, image processing method, and program
CN113642639B (en) Living body detection method, living body detection device, living body detection equipment and storage medium
Cardia Neto et al. 3DLBP and HAOG fusion for face recognition utilizing Kinect as a 3D scanner
Pinto et al. Counteracting presentation attacks in face, fingerprint, and iris recognition
Bresan et al. Facespoof buster: a presentation attack detector based on intrinsic image properties and deep learning
Kolberg et al. Colfispoof: A new database for contactless fingerprint presentation attack detection research
WO2021157214A1 (en) Information processing device, method for extracting silhouette, and program
JP2005259049A (en) Face collation device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20917965

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20917965

Country of ref document: EP

Kind code of ref document: A1