WO2014006786A1 - Dispositif et procédé d'extraction de valeur caractéristique - Google Patents

Dispositif et procédé d'extraction de valeur caractéristique Download PDF

Info

Publication number
WO2014006786A1
WO2014006786A1 PCT/JP2013/000635 JP2013000635W WO2014006786A1 WO 2014006786 A1 WO2014006786 A1 WO 2014006786A1 JP 2013000635 W JP2013000635 W JP 2013000635W WO 2014006786 A1 WO2014006786 A1 WO 2014006786A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
mask
dimensional
voxel
feature amount
Prior art date
Application number
PCT/JP2013/000635
Other languages
English (en)
Japanese (ja)
Inventor
裕紀 森
大 広瀬
稔 浅田
Original Assignee
国立大学法人大阪大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国立大学法人大阪大学 filed Critical 国立大学法人大阪大学
Publication of WO2014006786A1 publication Critical patent/WO2014006786A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Definitions

  • the present invention relates to a feature quantity extraction device and a feature quantity extraction method, and more particularly to a feature quantity extraction apparatus and a feature quantity extraction method for extracting feature quantities of 4D point cloud data that is time-series data of 3D point cloud data.
  • the motion identification technology that identifies the movement of a target from moving image data can be applied to all applications such as gesture recognition, suspicious person detection, or animal monitoring. For this reason, the degree of social contribution is large.
  • motion identification using information in a three-dimensional space (three-dimensional information) acquired by stereo vision or a laser range finder motion identification using an image captured by a normal camera or the like that does not use stereoscopic information is used. In comparison, information that is more faithful to reality can be obtained. For this reason, high discrimination ability and versatility applicable to all objects are expected.
  • SIFT Scale-Invariant Feature Transform
  • SURF Speeded Up Robust Feature
  • HOG Histogramof Oriented Gradients
  • HLH Local autocorrelation for example, see Patent Document 1
  • CHLAC CubicHLAC
  • the movement of the object in the real world can be described as a time change of the position of the object in the 3D space. That is, the movement of the object can be described as four-dimensional information.
  • information used for acquiring the feature amount is limited to three-dimensional information or two-dimensional information. For this reason, even if the movement of the object is identified using these feature amounts, it is difficult to identify the movement of the object with high accuracy due to the lack of the number of dimensions.
  • the present invention has been made to solve the above-described problem, and provides a feature quantity extraction device that extracts a feature quantity capable of identifying the motion of a target with high accuracy without limiting the identification target. For the purpose.
  • a feature value extraction apparatus is a feature value extraction apparatus that extracts feature values of four-dimensional point cloud data that is time-series data of three-dimensional point cloud data. For each mask that specifies the data position of at least one data including the data of interest, the 4D point of the data position specified by the mask at each scanning position while scanning the mask on the 4D point cloud data Feature amount extraction for calculating a sum of the product of pixel values of group data in the four-dimensional point group data, and extracting a feature amount vector having the sum calculated for each mask as a feature amount of the four-dimensional point group data There is no other mask that matches when each mask is translated in any one of the four-dimensional directions.
  • the present invention can be realized not only as a feature quantity extraction device including such a characteristic processing unit, but also as a step having a process executed by a characteristic processing unit included in the feature quantity extraction device. It can be realized as a quantity extraction method. It can also be realized as a program for causing a computer to function as a characteristic processing unit included in the feature amount extraction apparatus or a program for causing a computer to execute characteristic steps included in the feature amount extraction method. Such a program can be distributed via a computer-readable non-transitory recording medium such as a CD-ROM (Compact Disc-Read Only Memory) or a communication network such as the Internet. .
  • a computer-readable non-transitory recording medium such as a CD-ROM (Compact Disc-Read Only Memory) or a communication network such as the Internet.
  • a feature quantity extraction device that extracts a feature quantity that can identify the motion of a target with high accuracy without limiting the identification target.
  • FIG. 1 is a block diagram showing a functional configuration of a moving image identification apparatus according to an embodiment of the present invention.
  • FIG. 2 is a diagram showing combinations (masks) of displacement vectors in HLAC.
  • FIG. 3 is a diagram schematically illustrating a feature amount calculation process by HLAC (4D-HLAC) expanded in four dimensions.
  • FIG. 4 is a diagram for explaining the principle of the process of estimating the number of operations by the pattern identification unit.
  • FIG. 5 is a diagram for explaining three operations.
  • FIG. 6A is a diagram illustrating an example of a luminance image.
  • FIG. 6B is a diagram illustrating an example of a depth image.
  • FIG. 6C is a diagram illustrating an example of an image of three-dimensional voxel data output from the voxel conversion unit.
  • FIG. 7 is a diagram illustrating a result of comparing this method with another method.
  • FIG. 8 is a diagram showing a situation in which three people are operating simultaneously.
  • FIG. 9 is a diagram illustrating an example of a motion discrimination result.
  • HLAC is a simple technique with a low calculation cost, and is a feature quantity having excellent properties such as position invariance and additiveness that can be applied not only to images but also to tactile sensor data or audio data.
  • CHLAC has been proposed to calculate the feature amount of three-dimensional array data such as point cloud data (x, y, z) or moving image (x, y, t), and human gait recognition. It has been demonstrated to have excellent properties.
  • CHLAC is a feature amount calculated from point cloud data (x, y, z) or a moving image (x, y, t).
  • point cloud data x, y, z
  • moving image x, y, t
  • a feature quantity extraction device that extracts a feature quantity that can identify a motion of a target with high accuracy without limiting the identification target will be described.
  • a feature quantity extraction device is a feature quantity extraction device that extracts feature quantities of 4D point cloud data that is time-series data of 3D point cloud data. Then, for each mask that specifies the data position of at least one data including the data of interest, the 4 of the data positions specified by the mask at each scanning position while scanning the mask on the four-dimensional point cloud data.
  • feature quantities can be extracted from 4D point cloud data.
  • the four-dimensional point cloud data includes information about the depth direction of the target and information about the temporal movement of the target.
  • the processing of the feature amount extraction unit is not limited to a specific target. For this reason, it is possible to extract a feature quantity that can identify the movement of the object with high accuracy without limiting the identification object.
  • the above-described feature quantity extraction device further determines whether or not there is a point in each voxel obtained by dividing each of the three-dimensional point group data constituting the four-dimensional point group data into a grid of a predetermined size.
  • each of the three-dimensional point cloud data as three-dimensional voxel data by indicating the voxel value of the voxel
  • the four-dimensional point cloud data which is time-series data of the three-dimensional point cloud data
  • a voxel conversion unit that converts the three-dimensional voxel data into four-dimensional voxel data, which is time-series data
  • the feature extraction unit replaces the mask with the four-dimensional point cloud data for each of the masks; While scanning on the three-dimensional voxel data, the four-dimensional voxel of the product of the voxel values of the four-dimensional voxel data at the data position designated by the mask at each scanning position. Calculating the sum of Rudet
  • the number of voxels included in the 4D voxel data is smaller than the number of point clouds included in the 4D point cloud data.
  • the value of each voxel included in the four-dimensional voxel data is binary. For this reason, the data size of the four-dimensional voxel data is smaller than the data size of the four-dimensional point cloud data. Therefore, if the feature amount extracted by this feature amount extraction apparatus is used, the movement of the target can be identified at high speed.
  • the above-described feature amount extraction apparatus further calculates a difference value of each voxel value of the three-dimensional voxel data between temporally adjacent frames, thereby obtaining a difference image having a difference value as a voxel value.
  • An inter-frame difference unit that calculates 4D difference image data that is series data, and the feature amount extraction unit replaces the mask with the 4D point cloud data and the 4D voxel data for each of the masks. While scanning on the four-dimensional difference image data, the sum in the four-dimensional difference image data of the product of the voxel values of the four-dimensional difference image data at the data position specified by the mask at each scanning position is calculated.
  • a feature quantity vector having the calculated sum as an element may be extracted as a feature quantity of the four-dimensional difference image data.
  • the difference of each voxel value of the 3D voxel data between frames indicates whether or not there is a change in each voxel.
  • By using the feature amount extracted from the four-dimensional difference image data it is possible to identify the target motion with high accuracy.
  • the above-described feature amount extraction apparatus further calculates a difference value of each pixel value of the three-dimensional point cloud data between temporally adjacent frames, thereby obtaining a difference image having a difference value as a pixel value.
  • An inter-frame difference unit for calculating four-dimensional difference image data that is time-series data is provided, and the feature amount extraction unit replaces the mask with the four-dimensional point group data for each of the masks, and the four-dimensional difference image data. While scanning above, the sum in the four-dimensional difference image data of the product of the pixel values of the four-dimensional difference image data at the data position designated by the mask at each scanning position is calculated, and the sum calculated for each mask is calculated.
  • a feature quantity vector as an element may be extracted as a feature quantity of the four-dimensional difference image data.
  • the difference of each pixel value of the 3D point cloud data between frames indicates whether or not there is a change in each pixel.
  • By using the feature amount extracted from the four-dimensional difference image data it is possible to identify the target motion with high accuracy.
  • the data to be scanned by the feature quantity extraction unit is binary data of 1 or 0
  • a first mask for designating the same data position a plurality of times in the mask, and the first mask If there is a second mask that designates the same data position as the designated data position and that designates the data position that the first mask designates a plurality of times only once, the first mask may be deleted. good.
  • the values of 1 to the nth power are all 1. For this reason, the product of the pixel values of the four-dimensional point group data calculated using the first mask and the product of the pixel values of the four-dimensional point group data calculated using the second mask are the same value. Therefore, by deleting the first mask, it is possible to reduce the calculation amount for extracting the feature amount.
  • the mask designates the data position of the data of interest and the data positions of N pieces of data (N is an integer of 0 or more) located within a predetermined distance range from the data of interest.
  • FIG. 1 is a block diagram showing a functional configuration of a moving image identification apparatus according to an embodiment of the present invention.
  • the moving image identification device 100 is a device that identifies a target motion from 4D point cloud data that is time-series data of 3D point cloud data, and includes a feature quantity extraction device 10 and a pattern identification unit 20.
  • the position of each pixel constituting the four-dimensional point cloud data can be represented by an x coordinate, ay coordinate, a z coordinate, and a t coordinate.
  • the x coordinate, the y coordinate, and the z coordinate indicate the coordinate values of the x axis, the y axis, and the z axis in the three-dimensional space.
  • the t coordinate indicates the coordinate value of the t axis (time axis).
  • the pixel value of each pixel constituting the four-dimensional point cloud data can be expressed as I (x, y, z, t).
  • the position of each pixel of one piece of three-dimensional point group data constituting the four-dimensional point group data can be represented by an x coordinate, ay coordinate, and a z coordinate.
  • the pixel value of each pixel constituting the three-dimensional point cloud data can be expressed as I (x, y, z).
  • the feature quantity extraction device 10 is a device that extracts the feature quantity of the four-dimensional point cloud data, and includes a voxel conversion unit 11, an inter-frame difference unit 12, and a feature quantity extraction unit 13.
  • the voxel conversion unit 11 determines whether or not there is a point in each voxel obtained by dividing each 3D point cloud data constituting the 4D point cloud data into a grid of a predetermined size as a voxel value of the voxel.
  • 4D point group data which is time series data of 3D point group data
  • time series data 4 of 3D voxel data Convert to dimensional voxel data.
  • An image 31 shown in FIG. 1 is an image obtained by viewing one piece of three-dimensional point cloud data constituting the four-dimensional point cloud data from a predetermined direction. Each pixel value of the image 31 corresponds to the pixel value of any pixel of the four-dimensional point cloud data.
  • the image 32 shown in FIG. 1 is an image obtained by viewing one piece of three-dimensional voxel data constituting the four-dimensional voxel data from a predetermined direction.
  • Each cube shown in the image 32 indicates a voxel in which a point exists in the voxel among the voxels constituting the three-dimensional voxel data. That is, it indicates that the object exists at a position in the three-dimensional space of the voxel represented by the cube.
  • the inter-frame difference unit 12 calculates the difference value of each voxel value of the three-dimensional voxel data between temporally adjacent frames, thereby obtaining the four-dimensional difference image time-series data having the difference value as the voxel value. Difference image data is calculated.
  • the image 33 shown in FIG. 1 is an image obtained by viewing one difference image constituting the four-dimensional difference image data from a predetermined direction.
  • Each cube shown in the image 33 represents a voxel in which the difference value of the voxel values of the three-dimensional voxel data between frames is other than zero. That is, each cube (a voxel with a difference value other than 0) does not exist from the state where the object is not present in the voxel to the existing state, or from the state where the object is present in the voxel. Indicates a voxel that has changed state. Furthermore, each cube shows the position of the voxel where the object moved. On the other hand, a voxel having a difference value of 0 is not shown in the image 33, and the voxel indicates the position of a voxel that has not moved.
  • the feature amount extraction unit 13 scans the mask on the four-dimensional difference image data, and the data position specified by the mask at each scanning position.
  • the sum of the products of the voxel values of the four-dimensional difference image data in the four-dimensional difference image data is calculated, and a feature quantity vector having the sum calculated for each mask as an element is extracted as a feature quantity of the four-dimensional difference image data.
  • HLAC higher order local autocorrelation
  • HLAC Higher order local autocorrelation
  • r is the position vector
  • a two-dimensional vector each case HLAC in the case of CHLAC a 3-dimensional vector.
  • RN can have a plurality of different values by changing the combination of a 1 ,..., A N.
  • a feature vector can be constituted by a vector having these multiple values as elements.
  • the dimension of the feature vector is 35 in HLAC and 279 in 3D CHLAC.
  • FIG. 2 shows combinations of 35 displacement vectors in HLAC.
  • HLAC is applied to a four-dimensional array-like function (four-dimensional voxel data) I (x, y, z, t).
  • the four-dimensional array of functions I (x, y, z, t) is the four-dimensional difference image data output by the inter-frame difference unit 12 in the configuration of the moving image identification device 100 shown in FIG.
  • the four-dimensional array function I (x, y, z, t) is not limited to this, and any data may be used as long as it is time-series data of three-dimensional point cloud data.
  • the four-dimensional array function I (x, y, z, t) is four-dimensional point group data.
  • the four-dimensional array function I (x, y, z, t) may be four-dimensional voxel data.
  • FIG. 3 is a diagram schematically illustrating a feature amount calculation process by a four-dimensionally expanded HLAC (hereinafter referred to as “4D-HLAC”).
  • FIG. 3A shows an example of the four-dimensional point cloud data 300.
  • the four-dimensional point group data 300 includes a plurality of three-dimensional point group data 301 to 303. Each pixel of each three-dimensional point cloud data has a pixel value.
  • FIG. 3B shows an example of a 4D-HLAC position vector and displacement vector.
  • FIG. 3A shows an example of the four-dimensional point cloud data 300.
  • the four-dimensional point group data 300 includes a plurality of three-dimensional point group data 301 to 303. Each pixel of each three-dimensional point cloud data has a pixel value.
  • FIG. 3B shows an example of a 4D-HLAC position vector and displacement vector.
  • a mask 310 having a size of 3 ⁇ 3 ⁇ 3 ⁇ 3 is assumed, and the pixel 312a at the center of the mask 310 indicates the position of the position vector r, and hatching is performed at a position other than the center of the mask.
  • the submasks 311, 312, and 313 constituting the mask 310 are superimposed on the same position of the three-dimensional point cloud data 301, 302, and 303, respectively, and the product of the pixel values at the positions of the pixels 311 a, 312 a, and 313 a is calculated. Is done.
  • the sum of the calculated products in the four-dimensional point cloud data 300 is calculated. Since such a product is calculated for each mask, the feature amount of the four-dimensional point cloud data 300 can be calculated by calculating a feature amount vector whose element is the sum calculated for each mask.
  • the mask 310 is scanned by shifting the mask 310 at each position of the four-dimensional point cloud data 300 while shifting the mask 310 by one pixel (one voxel) in the x-axis direction, the y-axis direction, the z-axis direction, or the t-axis direction. This is done by fitting.
  • Extensible HLAC to 4 dimensions makes it possible to extract features for 4D voxel data.
  • By performing pattern recognition using four-dimensional voxel data the following properties that do not exist in pattern recognition by moving images appear.
  • position invariance In the case of moving images, there is position invariance with respect to movement parallel to the screen, but in the depth direction, the size on the image changes and position invariance does not hold. If three-dimensional information is used, position invariance is equivalent to depth.
  • a moving image that can be acquired by a camera or the like detects light that arrives after being reflected from a target object
  • information directly obtained is color information or luminance information.
  • a certain amount of geometric information can be indirectly obtained by performing processing such as edge detection on the moving image, but it is influenced by the color of the object.
  • information obtained from a three-dimensional information measuring instrument such as a laser range finder is direct geometric information. An object with various colors is not affected by geometric information.
  • a moving image of a three-dimensional image having depth information can eliminate the background by limiting the recognition area in the depth direction.
  • the object can be rotated about an arbitrary axis.
  • the object can be rotated arbitrarily. As a result, even if data is acquired from a certain direction, it is possible to generate data acquired virtually from any direction by rotating and replicating the data.
  • HLAC is often applied to a binary function I (r) that takes only 0 or 1 as in an edge image.
  • a combination having a plurality of the same displacement vectors outputs the same value as the combination combined by reducing the number of displacement vectors.
  • the mask 201, the mask 202, and the mask 203 output the same value.
  • the same value is output between the mask 204 and the mask 205. Therefore, the number of independent feature vector elements is smaller than when HLAC is applied to the multi-value function I (r), which is 25 for HLAC and 251 for CHLAC. In 4D-HLAC, the number of elements of independent feature vectors is 2481.
  • the 4D-HLAC feature value is obtained by calculating the sum of local patterns. For this reason, the 4D-HLAC feature value has the property that the same value is output even if the appearance position of the object changes (position invariance), or if there are multiple objects in the data, the entire feature value Has a property (additive property) that is equal to the sum of the feature quantities of the respective objects.
  • the 4D-HLAC feature quantity can be calculated using only the product and the sum, the calculation cost is low and it is suitable for real-time processing.
  • the 4D-HLAC feature quantity is a model-free feature quantity that can be applied to various objects, and the feature quantity vector always has a fixed length regardless of the data object. It is possible to apply to the method.
  • the feature quantity extraction unit 13 scans the mask 310 as shown in FIG. 3B on the four-dimensional difference image data output from the inter-frame difference unit 12, and outputs binary voxels.
  • a feature vector is extracted as a feature value of the four-dimensional difference image data by performing a product-sum operation of values.
  • the pattern identification unit 20 identifies the target motion based on the feature amount of the four-dimensional difference image data extracted by the feature amount extraction unit 13, and outputs the identification result.
  • the pattern identification method is not limited, in this embodiment, pattern identification using Fisher's linear discrimination is used as an example.
  • FIG. 4 is a diagram for explaining the principle of the process of estimating the number of operations by the pattern identification unit 20.
  • FIG. 4 (a) for example, three operations of “turn forward (Forward)”, “turn backward (Backward)”, and “up and down (Up Down)” are learned, Let the 4D-HLAC feature quantities corresponding to the motion be m 1 , m 2 , m 3 .
  • the 4D-HLAC feature quantity obtained from the input four-dimensional difference image data is assumed to be x (FIG. 4B). At this time, x can be expressed by a weighted linear sum of m 1 , m 2 , and m 3 as shown in FIG.
  • a 1 , a 2 , and a 3 represent the number of operations of each operation.
  • a plurality of m 1 , m 2 , and m 3 are acquired.
  • the dimension of these acquired feature quantities is reduced to three dimensions, and the feature quantities representing each operation are represented by m ′ 1 , m ′ 2 , and m ′ 3 as shown in FIG. If the feature vector reduced in three dimensions is x ′, the number of motions of each motion can be calculated by the equation shown in FIG. Next, such processing will be described in detail.
  • Fisher's linear discrimination is a technique for reducing dimensions while maintaining the class structure of data.
  • centroid v k after mapping of each class and the centroid v of all data can be expressed by Expression 4 and Expression 5.
  • N k is the number of data of each class, and N is the total number of data.
  • the intra-class variance s W and inter-class variance s B after mapping are obtained by the following equations 6 and 7, respectively.
  • mapping is performed according to Equation 2 using this W and dimension reduction is performed. Then, a norm of a difference from each class centroid v k is obtained, and data belongs to the class with the shortest distance.
  • the zero vector added here means a state in which no operation is performed, and is an essential point that all the operation numbers mean 0 in estimating the number of each operation. Further, by adding a feature vector of noise obtained from the environment to the (K + 1) th class, it is expected that a partial space in which the environmental noise component is removed by overlapping with the origin is obtained. By these processes, the number of operations does not increase, and the dimension of the subspace increases. Therefore, the number a of each operation can be obtained by Expression 17 obtained by modifying Expression 16.
  • FIG. 6A is a diagram illustrating an example of a luminance image.
  • FIG. 6B is a diagram illustrating an example of a depth image.
  • FIG. 6C is a diagram illustrating an example of an image of the three-dimensional voxel data output from the voxel conversion unit 11.
  • the discrimination rate and the labeling coincidence rate when the data of M-1 person is used as the learning data and the remaining one person data is used as the verification data are defined as the discrimination rate.
  • Identified using luminance video and depth video as comparison targets Since both of these can be handled as three-dimensional array data, it is possible to perform feature extraction by the conventional method CHLAC. Replace only the feature extraction process and make the other processes and conditions equal.
  • the feature quantity extraction process verification is performed using two types of original moving images (luminance moving image and depth moving image) obtained from the Kinect sensor using feature vectors extracted from images obtained by performing time differences. Since three motions that are difficult to discriminate with the luminance video from the front are taken up, it is considered that the identification rate by the luminance video declines, but since the depth video contains three-dimensional information, the identification rate by the depth video will be high. .
  • the motion identification rate using the feature amount extracted from the four-dimensional difference image data using 4D-HLAC is 98.2%.
  • the motion discrimination rate using the feature amount extracted from the luminance video using the CHLAC is 63.5%
  • the motion discrimination rate using the feature amount extracted from the depth video using the CHLAC is 75. .8%, both of which are inferior to this method.
  • the identification rate by depth animation is inferior to this method although 3D information is used.
  • the difference between this method using depth moving images and this method is whether to extract features as moving image data (3D array data) or voxel moving image data (4D array data). Both are depth information only. Nevertheless, the reason why the recognition rate of this method is high can be considered as follows.
  • the position in the depth direction of the object appears as the position of the voxel in the same way as in the left, right, top and bottom, but on the depth image, it appears as two changes of the pixel value and the size of the target object. Therefore, the feature extracted from the depth moving image changes due to the change in the position in the depth direction, which may adversely affect the identification.
  • HLAC is a feature extraction method based on local patterns.
  • the three-dimensional information is stored by expressing the depth by the pixel value.
  • the arrangement of the pixels is a two-dimensional array, objects in three-dimensionally separated locations are adjacent to each other. Can be similar to a normal image. For this reason, information between objects is added to the feature of the shape of the object, which may adversely affect identification.
  • the depth information is handled as three-dimensional voxel data as in this method, an object at a distant place is also at a distant place on the three-dimensional voxel data. For this reason, when feature extraction is performed by 4D-HLAC, non-adjacent objects have independent feature values. It can be said that converting the depth information into the three-dimensional voxel data before performing the feature extraction is for extracting essential features of the three-dimensional information.
  • FIG. 9 (a) is a graph showing the actual number of each of the three movements.
  • FIG. 9B is a graph showing estimation results of the numbers of the three movements.
  • FIG. 9C is a graph showing the result of calculating the simple moving average of the numbers of the three movements shown in FIG. 9B and rounding off the calculated simple moving average.
  • the horizontal axis indicates the number of frames, and the vertical axis indicates the number of movements.
  • the estimation result shown in FIG. 9B starts from the 20th frame because the feature amount of 4D-HLAC cannot be obtained until 20 frames of data are collected.
  • FIG. 9B shows that the estimation result includes many noise components, but an approximate number can be estimated. Many noise components are thought to be because the target motion is a periodic motion. For this reason, as shown in FIG. 9C, it can be improved by calculating a simple moving average of 20 frames and rounding off the simple moving average.
  • Process 1 is a process in which the voxel conversion unit 11 generates time-series three-dimensional voxel data (four-dimensional voxel data), and Process 2 includes the inter-frame difference unit 12, the feature amount extraction unit 13, and the pattern identification unit 20.
  • Process 2 includes the inter-frame difference unit 12, the feature amount extraction unit 13, and the pattern identification unit 20.
  • the feature extraction process by 4D-HLAC which takes time compared with other processes, is configured in such a process so as not to reduce the capture speed of high-speed four-dimensional point cloud data.
  • the voxel conversion unit 11 generates time-series three-dimensional voxel data (four-dimensional voxel data) by iteratively processing according to the capture speed according to the following procedure.
  • the voxel conversion unit 11 acquires three-dimensional information (three-dimensional point cloud data) from the Kinect sensor.
  • the voxel conversion unit 11 converts the acquired three-dimensional information into three-dimensional voxel data.
  • the voxel conversion unit 11 writes and updates one frame of three-dimensional voxel data in a time-series voxel data buffer included in the voxel conversion unit 11.
  • the inter-frame difference unit 12, the feature amount extraction unit 13, and the pattern identification unit 20 repeatedly identify motion from the time-series three-dimensional voxel data generated by the voxel conversion unit 11 according to the following procedure.
  • the interframe difference unit 12 acquires three-dimensional voxel data from the time-series voxel data buffer.
  • the inter-frame difference unit 12 calculates four-dimensional difference image data that is time-series data of the difference image by calculating a difference value of each voxel value of the three-dimensional voxel data between adjacent frames.
  • the feature quantity extraction unit 13 extracts a 4D-HLAC feature quantity from the four-dimensional difference image data.
  • the pattern identification unit 20 identifies the target motion using the 4D-HLAC feature quantity extracted by the feature quantity extraction unit 13.
  • Table 3 shows the parameters for real-time identification.
  • 4D-HLAC is a feature that the processing content does not change depending on the type and amount of the object, but when 4D-HLAC is implemented for binary data as in this method, there is a part with a voxel value of 0 in the mask. The product of the voxel values of the position vector and displacement vector specified by the mask is zero. For this reason, the calculation for the mask can be skipped. For this reason, as the voxel value increases, the processing cost decreases. In the present embodiment, the difference in voxel values is calculated between adjacent frames.
  • the processing speed of Process 2 varies.
  • the difference processing by the inter-frame difference unit 12 included in Process 2 is preferably included in Process 1 because there is no possibility that the processing overlaps.
  • the speed is reduced 30 times per second, so it is included in Process2.
  • feature quantities can be extracted from the four-dimensional point cloud data.
  • the four-dimensional point cloud data includes information about the depth direction of the target and information about the temporal movement of the target.
  • the processing of the feature amount extraction unit is not limited to a specific target. For this reason, it is possible to extract a feature quantity that can identify the movement of the object with high accuracy without limiting the identification object. For example, even if gesture recognition is performed by preparing a feature amount when a specific gesture is performed as a template and comparing the template with a feature amount obtained from input four-dimensional point cloud data. good. Further, it may be used as a part of a crime prevention system that extracts principal components of feature amounts of a large amount of data and detects them as abnormal information when feature amounts that cannot be explained by the principal components appear.
  • the number of voxels included in the 4D voxel data is smaller than the number of pixels included in the 4D point cloud data.
  • the value of each voxel included in the four-dimensional voxel data is binary. For this reason, the data size of the four-dimensional voxel data is smaller than the data size of the four-dimensional point cloud data. Therefore, if the feature amount extracted by this feature amount extraction apparatus is used, the movement of the target can be identified at high speed.
  • the difference in each voxel value of the 3D voxel data between frames indicates whether or not there is a change in each voxel.
  • the feature amount extracted from the four-dimensional difference image data it may be possible to identify the target motion with high accuracy.
  • the moving image identification device 100 and the feature quantity extraction device 10 according to the embodiment of the present invention have been described above, but the present invention is not limited to this embodiment.
  • each pixel value of the three-dimensional point cloud data may represent luminance, or may represent the presence probability of a target at a three-dimensional position corresponding to the pixel.
  • the inter-frame difference unit 12 performs the difference of the three-dimensional voxel data between adjacent frames, but uses the three-dimensional voxel data generated from the background image in which the pattern identification target is not reflected as the background image, and calculates the background difference. It may be what you do.
  • the voxel conversion unit 11 and the inter-frame difference unit 12 are arbitrary constituent elements. It may be provided in the quantity extraction device 10 or may not be provided.
  • the feature quantity extraction device 10 may include only the feature quantity extraction unit 13.
  • the feature amount extraction unit 13 extracts a 4D-HLAC feature amount from the four-dimensional point group data input to the moving image identification device 100.
  • the feature quantity extraction apparatus 10 includes the voxel conversion unit 11 and the feature quantity extraction unit 13 and may not include the inter-frame difference unit 12.
  • the feature amount extraction unit 13 extracts a 4D-HLAC feature amount from the four-dimensional voxel data generated by the voxel conversion unit 11.
  • the feature quantity extraction apparatus 10 includes the inter-frame difference unit 12 and the feature quantity extraction unit 13 and may not include the voxel conversion unit 11.
  • the inter-frame difference unit 12 calculates the difference value between the frames of the three-dimensional point group data constituting the four-dimensional point group data input to the moving image identification device 100, thereby obtaining the difference value.
  • Four-dimensional difference image data which is time-series data of a difference image as pixel values, is calculated.
  • the feature amount extraction unit 13 extracts a 4D-HLAC feature amount from the four-dimensional difference image data calculated by the inter-frame difference unit 12.
  • each of the above devices may be specifically configured as a computer system including a microprocessor, ROM, RAM, hard disk drive, display unit, keyboard, mouse, and the like.
  • a computer program is stored in the RAM or hard disk drive.
  • Each device achieves its functions by the microprocessor operating according to the computer program.
  • the computer program is configured by combining a plurality of instruction codes indicating instructions for the computer in order to achieve a predetermined function.
  • the system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip, and specifically, a computer system including a microprocessor, ROM, RAM, and the like. .
  • a computer program is stored in the RAM.
  • the system LSI achieves its functions by the microprocessor operating according to the computer program.
  • each of the above-described devices may be configured from an IC card or a single module that can be attached to and detached from each device.
  • the IC card or module is a computer system that includes a microprocessor, ROM, RAM, and the like.
  • the IC card or the module may include the super multifunctional LSI described above.
  • the IC card or the module achieves its function by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.
  • the present invention may be the method described above. Further, the present invention may be a computer program that realizes these methods by a computer, or may be a digital signal composed of the computer program.
  • the present invention relates to a non-transitory recording medium that can read the computer program or the digital signal, such as a flexible disk, a hard disk, a CD-ROM, an MO, a DVD, a DVD-ROM, a DVD-RAM, a BD ( It may be recorded on a Blu-ray Disc (registered trademark), a semiconductor memory, or the like.
  • the digital signal may be recorded on these non-temporary recording media.
  • the computer program or the digital signal may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, data broadcasting, or the like.
  • the present invention may also be a computer system including a microprocessor and a memory.
  • the memory may store the computer program, and the microprocessor may operate according to the computer program.
  • the present invention can be applied to a feature amount extraction apparatus that extracts a feature amount from a time-series image of a three-dimensional image, and in particular, can be applied to a moving image identification apparatus that performs pattern identification using the extracted feature amount.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un dispositif d'extraction de valeur caractéristique (10) permettant d'extraire une valeur caractéristique des données d'un groupe de points 4D, qui sont les données de séries chronologiques des données d'un groupe de points 3D, ledit dispositif comprenant : une unité d'extraction de valeur caractéristique (13) qui, pour chaque masque désignant un emplacement de données d'au moins une instance de données qui comprend des données d'intérêt, lors de la numérisation du masque sur les données du groupe de points 4D, calcule à chaque emplacement de numérisation une somme dans les données du groupe de points 4D d'un produit d'une valeur de pixel des données du groupe de points 4D de l'emplacement de données désigné par le masque, et extrait en tant que valeur caractéristique des données du groupe de points 4D, un vecteur de valeur caractéristique avec la somme calculée pour chaque masque constituant un élément de celui-ci. En ce qui concerne chaque masque, aucun autre masque correspondant n'est présent lors du déplacement en parallèle dans l'une des directions 4D.
PCT/JP2013/000635 2012-07-03 2013-02-06 Dispositif et procédé d'extraction de valeur caractéristique WO2014006786A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2012-149702 2012-07-03
JP2012149702A JP6052533B2 (ja) 2012-07-03 2012-07-03 特徴量抽出装置および特徴量抽出方法

Publications (1)

Publication Number Publication Date
WO2014006786A1 true WO2014006786A1 (fr) 2014-01-09

Family

ID=49881564

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/000635 WO2014006786A1 (fr) 2012-07-03 2013-02-06 Dispositif et procédé d'extraction de valeur caractéristique

Country Status (2)

Country Link
JP (1) JP6052533B2 (fr)
WO (1) WO2014006786A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108320322A (zh) * 2018-02-11 2018-07-24 腾讯科技(成都)有限公司 动画数据处理方法、装置、计算机设备和存储介质
CN110392193A (zh) * 2019-06-14 2019-10-29 浙江大学 一种掩膜板相机的掩膜板
WO2021019906A1 (fr) * 2019-07-26 2021-02-04 パナソニックIpマネジメント株式会社 Appareil de télémétrie, procédé de traitement d'informations et appareil de traitement d'informations
CN112418105A (zh) * 2020-11-25 2021-02-26 湖北工业大学 基于差分方法的高机动卫星时间序列遥感影像运动舰船目标检测方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6888386B2 (ja) * 2017-04-17 2021-06-16 富士通株式会社 差分検知プログラム、差分検知装置、差分検知方法
JP7323235B2 (ja) * 2020-03-25 2023-08-08 Necソリューションイノベータ株式会社 画像追跡装置、画像追跡方法、及びプログラム

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011248664A (ja) * 2010-05-27 2011-12-08 Panasonic Corp 動作解析装置および動作解析方法

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011248664A (ja) * 2010-05-27 2011-12-08 Panasonic Corp 動作解析装置および動作解析方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAZUFUMI SUZUKI ET AL.: "A Solid Texture Classification Method Based on 3D Mask Patterns of Higher Order Local Autocorrelations", TRANSACTIONS OF INFORMATION PROCESSING SOCIETY OF JAPAN, vol. 48, no. 3, March 2007 (2007-03-01), pages 1524 - 1531 *
SHO IKEMURA ET AL.: "Jikukan Joho to Kyori Joho o Mochiita Joint Boosting ni yoru Dosa Shikibetsu", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRICAL ENGINEERS OF JAPAN C, vol. 130, no. 9, 1 October 2010 (2010-10-01), pages 1554 - 1560 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108320322A (zh) * 2018-02-11 2018-07-24 腾讯科技(成都)有限公司 动画数据处理方法、装置、计算机设备和存储介质
CN108320322B (zh) * 2018-02-11 2021-06-08 腾讯科技(成都)有限公司 动画数据处理方法、装置、计算机设备和存储介质
CN110392193A (zh) * 2019-06-14 2019-10-29 浙江大学 一种掩膜板相机的掩膜板
WO2021019906A1 (fr) * 2019-07-26 2021-02-04 パナソニックIpマネジメント株式会社 Appareil de télémétrie, procédé de traitement d'informations et appareil de traitement d'informations
CN112418105A (zh) * 2020-11-25 2021-02-26 湖北工业大学 基于差分方法的高机动卫星时间序列遥感影像运动舰船目标检测方法

Also Published As

Publication number Publication date
JP2014013432A (ja) 2014-01-23
JP6052533B2 (ja) 2016-12-27

Similar Documents

Publication Publication Date Title
CN107466411B (zh) 二维红外深度感测
CN108292362B (zh) 用于光标控制的手势识别
Aggarwal et al. Human activity recognition from 3d data: A review
JP5726125B2 (ja) 奥行き画像内の物体を検出する方法およびシステム
US9633483B1 (en) System for filtering, segmenting and recognizing objects in unconstrained environments
US9098740B2 (en) Apparatus, method, and medium detecting object pose
KR101283262B1 (ko) 영상 처리 방법 및 장치
Holte et al. View-invariant gesture recognition using 3D optical flow and harmonic motion context
JP6052533B2 (ja) 特徴量抽出装置および特徴量抽出方法
TWI394093B (zh) 一種影像合成方法
Atrevi et al. A very simple framework for 3D human poses estimation using a single 2D image: Comparison of geometric moments descriptors
Ahmad et al. Using discrete cosine transform based features for human action recognition
JP2016099982A (ja) 行動認識装置、行動学習装置、方法、及びプログラム
CN110751097B (zh) 一种半监督的三维点云手势关键点检测方法
Rahman et al. Recognising human actions by analysing negative spaces
US11823394B2 (en) Information processing apparatus and method for aligning captured image and object
JP7282216B2 (ja) 単眼スチルカメラのビデオにおけるレイヤードモーションの表現と抽出
JP2016014954A (ja) 手指形状の検出方法、そのプログラム、そのプログラムの記憶媒体、及び、手指の形状を検出するシステム。
KR101478709B1 (ko) Rgb-d 영상 특징점 추출 및 특징 기술자 생성 방법 및 장치
CN111353069A (zh) 一种人物场景视频生成方法、系统、装置及存储介质
Planinc et al. Computer vision for active and assisted living
López-Fernández et al. independent gait recognition through morphological descriptions of 3D human reconstructions
Li et al. Real-time action recognition by feature-level fusion of depth and inertial sensor
JP6393495B2 (ja) 画像処理装置および物体認識方法
Kapuscinski et al. Recognition of signed dynamic expressions observed by ToF camera

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13812917

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13812917

Country of ref document: EP

Kind code of ref document: A1