WO2021130864A1 - 特徴学習システム、特徴学習方法およびプログラム - Google Patents

特徴学習システム、特徴学習方法およびプログラム Download PDF

Info

Publication number
WO2021130864A1
WO2021130864A1 PCT/JP2019/050642 JP2019050642W WO2021130864A1 WO 2021130864 A1 WO2021130864 A1 WO 2021130864A1 JP 2019050642 W JP2019050642 W JP 2019050642W WO 2021130864 A1 WO2021130864 A1 WO 2021130864A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
similarity
learning
learning data
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2019/050642
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
諒 川合
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to US17/785,554 priority Critical patent/US20230012026A1/en
Priority to JP2021566607A priority patent/JP7367775B2/ja
Priority to PCT/JP2019/050642 priority patent/WO2021130864A1/ja
Publication of WO2021130864A1 publication Critical patent/WO2021130864A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/76Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries based on eigen-space representations, e.g. from pose or different illumination conditions; Shape manifolds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm

Definitions

  • the present invention relates to a system, method and program for efficiently learning the behavior of a person in a video.
  • a method of using the final layer of principal component analysis or deep learning can be considered.
  • distance learning Metal Learning
  • Distance learning focuses on the distance of the feature quantity in the vector space, not the feature quantity itself, and proceeds with learning so as to construct a feature space in which similar behaviors are placed close to each other and different behaviors are placed far away. It is a thing.
  • walking smartphone walking motion while using a smartphone or the like
  • downward walking a walking motion while simply looking down
  • Patent Document 1 when selecting a job seeker's resume that meets the conditions of a company's job vacancies, the keywords in the document are grouped into several topics, and learning is performed based on those topics, so that a small amount of learning documents is provided. However, it makes it possible to extract the target resume with high accuracy.
  • One of the objects of the present invention is to provide a technique for reducing the time required for learning and identifying a person's behavior.
  • the feature learning system of the present invention A similarity definition means that defines the similarity between two classes corresponding to each of the two feature vectors, A learning data generation means that acquires the similarity based on a combination of classes to which each of the plurality of feature vectors acquired as a processing target belongs, and generates learning data including the plurality of feature vectors and the similarity. , A learning means for carrying out machine learning using the learning data, and To be equipped.
  • the feature learning method of the present invention The computer Define the similarity between the two classes corresponding to each of the two feature vectors, The similarity is acquired based on the combination of classes to which each of the plurality of feature vectors acquired as the processing target belongs. A learning data including the plurality of feature vectors and the similarity is generated, and the learning data is generated. Perform machine learning using the learning data, Including that.
  • the program of the present invention causes a computer to execute the above-mentioned feature learning method.
  • the first problem-solving means According to the first problem-solving means according to the present invention, a technique for reducing the time required for learning and identifying a person's behavior is provided.
  • FIG. 1st Embodiment It is a figure which illustrates the structure of the feature learning system of 1st Embodiment. It is a figure which shows an example of the information stored in a feature DB. It is a figure for demonstrating the operation example of the similarity definition part. It is another figure for demonstrating the operation example of the similarity definition part. It is a figure which shows an example of the information stored in the similarity degree DB. It is a figure which shows an example of the information stored in the similarity degree DB. It is a figure which shows an example of the information stored in the learning DB. It is a figure which shows another example of the information stored in the learning DB. It is a block diagram which illustrates the hardware structure of the feature learning system.
  • each block diagram represents a configuration of a functional unit, not a configuration of a hardware unit.
  • the direction of the arrow in the figure is for making the flow of information easy to understand, and does not limit the direction of communication (one-way communication / two-way communication) unless otherwise specified.
  • the feature learning system extracts behavioral features from, for example, sensor information, and then determines the degree of similarity from a combination of behavioral features to be learned.
  • the combination of behavioral features and the degree of similarity are stored in a learning database (hereinafter referred to as "learning DB") in a state of being associated with each other, for example.
  • learning DB a learning database
  • the feature learning system performs learning based on the degree of similarity at the time of learning. According to this, behavioral features with different degrees of behavior can be learned in consideration of the similarity, so that there is an effect that learning can proceed more stably.
  • FIG. 1 is a diagram illustrating the configuration of the feature learning system 100 of the first embodiment.
  • the feature learning system 100 illustrated in FIG. 1 includes a feature database (hereinafter referred to as “feature DB”) 111, a similarity definition unit 101, and a similarity database (hereinafter referred to as “similarity DB”) 112.
  • feature DB feature database
  • similarity DB similarity database
  • a learning data generation unit 102, a learning DB 113, and a learning unit 103 are provided.
  • these components may be provided in one device (computer), or may be distributed and provided in a plurality of devices (computers). In the following description, it is assumed that one device (computer) includes all the components of the feature learning system 100.
  • the feature DB 111 stores a plurality of behavior features together with class information corresponding to each behavior feature.
  • a behavioral feature is information indicating a behavioral feature of a person, and is represented by, for example, a vector of a certain feature space. Behavioral features are generated based on information obtained by sensors such as visible light cameras, infrared cameras, and depth sensors (hereinafter, also referred to as "sensor information").
  • sensor information information obtained by sensors such as visible light cameras, infrared cameras, and depth sensors (hereinafter, also referred to as "sensor information").
  • a behavioral feature is a sensor information obtained by sensing an area in which a person taking an action exists, a skeleton information of a person generated based on the sensor information, or a conversion thereof using a predetermined function. Includes information such as However, behavioral traits may include other information. Existing methods can be used to generate and acquire behavioral features.
  • Class information is information that indicates what kind of behavior the behavioral characteristics are related to, that is, what kind of behavior is.
  • Class information for example, class information is manually input via an input device (not shown).
  • the class information may be given to each of the behavioral features acquired as described above by using a learning model trained to classify each behavioral feature into the corresponding class.
  • FIG. 2 is a diagram showing an example of information stored in the feature DB 111.
  • the feature DB 111 includes class information indicating the type of action (for example, class 0) and action features corresponding to the class (position coordinates of each feature point of the person when the action is taken). Is associated and stored.
  • the similarity definition unit 101 defines the similarity between two classes corresponding to each of the two behavioral features and stores it in the similarity DB 112.
  • the degree of similarity of behavioral characteristics is represented by, for example, a numerical value of 0 or more and 1 or less. In this case, the larger the value (the closer the value is to 1), the more similar the two behavioral characteristics that form a pair are.
  • a method of defining the similarity in the similarity definition unit 101 several methods can be considered. Broadly speaking, there are a method of determining the degree of similarity for each class of behavior in a group and a method of individually determining the degree of similarity for each behavioral feature. When the similarity is individually determined for each behavioral feature, the similarity definition unit 101 defines a calculation formula for obtaining the similarity.
  • the similarity definition unit 101 can define the similarity for each combination of classes as follows, for example. The operation described below is merely an example, and the operation of the similarity definition unit 101 is not limited to the following example.
  • the similarity definition unit 101 takes out the behavioral feature stored in the feature DB 111.
  • the similarity definition unit 101 classifies each of the behavioral features extracted from the feature DB 111 into corresponding classes using, for example, a learning model constructed by machine learning.
  • the similarity definition unit 101 performs principal component analysis on the behavioral features in each class, and obtains an eigenvector for the obtained first principal component.
  • the similarity s ij between class i and class j is defined as follows using each class i and class j eigenvectors v i and v j. This makes the cosine of the angle between v i and v j corresponds to a value obtained by normalizing to satisfy the similarity condition.
  • the similarity definition unit 101 stores all s ij when i and j are changed in the range of [1, n] in the similarity DB 112.
  • the similarity definition unit 101 can define the similarity for each combination of classes as follows, for example. The operation described below is merely an example, and the operation of the similarity definition unit 101 is not limited to the following example.
  • the similarity definition unit 101 extracts the same number of behavioral features for each class from the feature DB 111.
  • the similarity definition unit 101 further classifies the extracted behavioral features within the class. For example, the similarity definition unit 101 uses a part (the same number for each class) as an evaluation feature and the rest as a learning feature for the behavioral features extracted for each class.
  • the similarity definition unit 101 performs learning by a conventional method using the learning features, and then identifies the evaluation features with the obtained classifier (learning model). Then, the similarity definition unit 101 aggregates the identification results of the evaluation features for each class. Then, the similarity definition unit 101 calculates the ratio mst of recognizing the behavioral feature belonging to the class s as the behavioral feature belonging to the class t based on the result of aggregation. In this case, the similarity s ij between class i and class j, and behavioral features belonging behavioral features belonging to the ratio m ij and class j recognized and behavioral features belonging behavioral features belonging to class i to the class j in class i It is defined as follows using the recognized ratio mji.
  • the similarity definition unit 101 can define the similarity s ij between the class i and the class j as "0.15" by using the above equation (2).
  • the similarity definition unit 101 stores all s ij when i and j are changed in the range of [1, n] in the similarity DB 112.
  • the degree of similarity may be artificially determined.
  • the similarity between the normal walking motion and the motion when falling down is 0, and the similarity between the walking smartphone and the walking while looking down is 0.25.
  • the similarity definition unit 101 can define the similarity for each combination of classes as follows, for example. The operation described below is merely an example, and the operation of the similarity definition unit 101 is not limited to the following example. First, the similarity definition unit 101 displays a screen for setting the similarity for each combination of classes on a display (not shown) used by the person in charge of work. The worker inputs a numerical value to be set for each combination of classes on the screen displayed on the display.
  • the similarity definition unit 101 may classify a part or all of the behavioral features stored in the feature DB 111 into, for example, classes and display them on the display.
  • the worker can utilize the classification result of the behavioral feature displayed on the display for each class as support information when determining the similarity of the combination of two different classes. For example, the worker refers to and compares the behavioral characteristics classified in the first class and the behavioral characteristics classified in the second class as the similarity of the combination of the first and second classes. You can decide the numerical value to be set.
  • the similarity definition unit 101 does not have the function of displaying the classification result as described above on the display, the worker may input a numerical value to be set based on his / her own sense, for example. Then, the similarity definition unit 101 stores the numerical value input on the screen in the similarity DB 112 together with the information indicating the combination of classes.
  • the similarity definition unit 101 can define the similarity for each combination of behavioral features, for example, as follows. The operation described below is merely an example, and the operation of the similarity definition unit 101 is not limited to the following example.
  • the similarity definition unit 101 extracts all behavioral features from the feature DB 111 and performs principal component analysis.
  • the similarity definition unit 101 may reduce the dimension of the behavioral feature based on the result of the principal component analysis for each behavioral feature. Conventional methods can be used for dimensionality reduction.
  • the similarity definition unit 101 sets the similarity of the feature vectors obtained from each behavior feature as the behavior similarity.
  • the degree of similarity s vw between the first behavioral feature V and the second behavioral feature W is set to the feature vector v of the first behavioral feature V and the feature vector w of the second behavioral feature W.
  • the norm of the difference between the two (L2 norm may be used, but other norms may be used) can be defined as the following equation (3).
  • the angle formed by the similarity s vw between the first behavior feature V and the second behavior feature W between the feature vector v of the first behavior feature V and the feature vector w of the second behavior feature W It can be defined as the following equation (4) using the cosine of.
  • the similarity DB 112 stores the conversion formula for dimensionality reduction and the definition formula of the similarity.
  • FIG. 3 is a diagram for explaining an operation example of the similarity definition unit 101.
  • FIG. 3 shows skeletal information of each of persons A and B normalized based on height as an example of behavioral characteristics.
  • FIG. 3 shows skeletal information of each of persons A and B normalized based on height as an example of behavioral characteristics.
  • points A 0 to A 13 and points B 0 to B 13 are characteristic points of the person A and the person B, respectively.
  • the subscripts (0 to 13) correspond to the characteristic points of the person.
  • the subscript "0" is the head
  • the subscript "1” is the neck
  • the subscript "2” is the right shoulder joint
  • the subscript "3” is the right elbow joint
  • the subscript "4" is the right wrist joint.
  • each point can be said to be information indicating the skeleton of a person (personal skeleton information). At this time, each point may be defined in the camera coordinate system or may be defined in the world coordinate system.
  • the midpoint of both hip joints that is, the midpoint of each of the line segment A 8 A 11 and the line segment B 8 B 11 is set as the origin O.
  • the vector from the origin O toward the points A 0 to A 13 be a 0 to a 13
  • the vector from the origin O toward the points B 0 to B 13 be b 0 to b 13 .
  • ⁇ 1 to ⁇ 12 and ⁇ 1 to ⁇ 12 are defined as the angles formed by the line segment connecting the feature points as shown in FIG.
  • the similarity definition unit 101 can convert the distance d between behavioral features into the similarity s based on, for example, the following equation (5). If the maximum value D of the distance d can be expected from physical constraints or the like, the similarity definition unit 101 can also calculate the similarity s based on the following equation (6).
  • the similarity definition unit 101 can calculate the total value of the distances between the corresponding feature points as the distance d between the behavioral features by using the following equation (7).
  • the distance d may be determined by the following equation (8).
  • the similarity definition unit 101 uses the following equation (8) to determine the distance between the center of gravity of the feature point of the first behavioral feature and the center of gravity of the feature point of the second behavioral feature, and the distance d between the behavioral features. Can be calculated as.
  • the distance d may be defined as the following equation (9) or equation (10).
  • the following equations (9) and (10) are based on the fact that the difference in behavior depending on the posture is more likely to appear in the height direction than in the lateral direction, respectively, from the above equations (7) and (8) in the height direction. Except for the information other than.
  • a y0 ⁇ a y13 and b y0 ⁇ b y13 respectively, the height direction of the element of the vector a 0 ⁇ a 13, and the vector b 0 ⁇ b 13.
  • the similarity s may be determined by the following equation (11) in the procedure for obtaining the angle formed by the vector from the inner product.
  • the similarity s may be determined by the following equation (12) based on the angle formed by the line segment connecting the feature points.
  • the similarity definition unit 101 determines the distance d between two behavioral features or the similarity between two behavioral features based on the movement information of the feature points of each person.
  • the degree s may be set.
  • the similarity definition unit 101 acquires the behavioral characteristics of each of the person A and the person B over time, and based on the plurality of behavioral characteristics (temporally continuous behavioral characteristics) acquired for each individual person.
  • the movement information of the feature points of each person can be calculated. For example, it is assumed that the positions of the feature points of the person A and the person B change from the state shown in FIG. 3 to the state shown in FIG. 4 at the next acquisition opportunity of FIG.
  • FIG. 4 is another diagram for explaining an operation example of the similarity definition unit.
  • the distance d between the two behavioral features or the similarity s between the two behavioral features can be set as, for example, the following equations (13), (14), (15) or (16). You may decide. These equations are obtained by transforming each of the equations (7), (9), (11), and (12) into equations using the movement information of the feature points of each person.
  • the degree of similarity s between two behavioral features may be determined based on whether or not the feature points are detected. For example, A k and B k are both detected, or if detected 1, using a function h (k) only becomes zero when I is detected one, the similarity s by the following equation (17) It is conceivable to determine.
  • the similarity definition unit 101 calculates a plurality of similarity using at least two or more of the above equations (7) to (17), and integrates them by averaging or the like to form the similarity DB 112. You may find the similarity to be stored.
  • the similarity may be calculated by a method other than the method illustrated here. For example, if the behavior belongs to the same class, the similarity is set to 1, and if the behavior belongs to a different class, the similarity is determined for each feature. You may combine methods to determine the degree of similarity.
  • FIGS. 5 and 6 are diagrams showing an example of information stored in the similarity DB 112. 5 and 6 show an example of information when five classes 0 to 4 exist.
  • the similarity DB 112 stores one similarity for each combination of classes.
  • the similarity DB 112 stores one similarity for the combination of the same class, and stores a calculation formula for obtaining the similarity for the combination of different classes. Note that these figures are merely examples, and the information stored in the similarity DB 112 is not limited to these figures.
  • the learning data generation unit 102 extracts a plurality of behavioral features from the feature DB 111 together with the class information associated with each behavioral feature.
  • the learning data generation unit 102 may randomly take out a plurality of behavioral features to be processed from the feature DB 111, or may take out from the feature DB 111 according to a predetermined rule. Then, the learning data generation unit 102 arbitrarily selects two behavioral features from the behavioral features extracted from the feature DB 111, and combines classes based on the class information associated with each of the two behavioral features. Identify. Then, the learning data generation unit 102 extracts the similarity or the calculation formula for obtaining the similarity corresponding to the combination of the specified classes from the similarity DB 112.
  • the learning data generation unit 102 substitutes the above-selected two behavioral features into the calculation formula to obtain the similarity. Finally, the learning data generation unit 102 stores the two selected behavioral features and the similarity obtained by using the information of the similarity DB 112 as a set of learning data in the learning DB 113. ..
  • the learning unit 103 extracts a required number of sets of similarity and behavioral features from the learning DB 113, and performs machine learning.
  • a machine learning method an existing method can be used.
  • the learning unit 103 introduces the similarity as a new variable and carries out machine learning.
  • the learning data generation unit 102 generates learning data used for distance learning
  • the learning unit 103 performs distance learning using the learning data.
  • the learning data generation unit 102 When using Siamese Network, the learning data generation unit 102 first extracts two behavioral features from the feature DB 111. Then, the learning data generation unit 102 obtains the similarity between the two extracted behavioral features as described above, and summarizes the two behavioral features and the similarity obtained for those two behavioral features into one set. And store it in the learning DB 113 (example: FIG. 7).
  • FIG. 7 is a diagram showing an example of information stored in the learning DB 113.
  • the learning unit 103 When using Siamese Network, the learning unit 103 extracts a required number of sets (learning data) of two behavioral features and similarity from the learning DB 113, and performs machine learning. At this time, the learning unit 103 performs learning by substituting the similarity of the extracted learning data into s in the above equation (18) as Loss.
  • the Triplet Network is a set of three learning data, that is, an anchor sample as a reference, a positive sample, and a negative sample, and the learning is advanced so that the Loss shown below becomes small.
  • the Positive sample belongs to the same class as the Anchor sample.
  • the Negative sample belongs to a different class from the Anchor sample and the Positive sample.
  • d p is the distance between the Anchor samples and Positive samples.
  • d n represents the distance between the Anchor sample and the Negative sample.
  • m is a constant called a margin.
  • the learning data generation unit 102 extracts an behavior feature (referred to as A) and two behavior features (referred to as X and Y) as anchor samples from the feature DB 111. Then, the learning data generation unit 102 obtains the degree of similarity between the behavioral features A and X and between the behavioral features A and Y as described above. It is desirable that the behavioral feature X and the behavioral feature Y are selected so that the difference between the two similarities required here becomes large. For example, the learning data generation unit 102 selects one of the behavior feature X or the behavior feature Y from the same class as the behavior feature A, and selects the other from the class different from the behavior feature A, so that the difference between the two similarity degrees Can be increased.
  • A behavior feature
  • X and Y two behavior features
  • the learning data generation unit 102 calculates the similarity with the behavior feature A for each of the behavior feature X and the behavior feature Y randomly extracted from the feature DB 111, and the similarity between the calculated A and X.
  • Two behavioral features to be used for processing may be selected together with the behavioral feature A based on the difference between and the similarity between A and Y. For example, if the difference between the calculated similarity between A and X and the similarity between A and Y is equal to or greater than a predetermined threshold value (for example, 0.5), the learning data generation unit 102 performs the behavior feature X.
  • a predetermined threshold value for example, 0.5
  • the behavior feature Y may be selected as the behavior feature used to generate the learning data, and if the difference is less than a predetermined threshold value, the behavior feature X and the behavior feature Y may not be selected.
  • the learning data generation unit 102 presents to the user a screen including, for example, the calculation result of the similarity between A and X and the similarity between A and Y, and uses it for processing together with the behavior feature A2. It may be configured to determine whether or not to select as one behavioral feature based on the user's selection operation on the screen. Then, the learning data generation unit 102 puts together three behavioral features (A, X and Y) and two similarities (similarity between A and X and similarity between A and Y) into one set. It is stored in the learning DB 113 (example: FIG. 8).
  • FIG. 8 is a diagram showing another example of the information stored in the learning DB 113.
  • the learning unit 103 extracts a required number of sets (learning data) of three behavioral features and two similarities from the learning DB 113, and performs machine learning. At this time, the learning unit 103 defines Loss as follows.
  • s x, s y is between each behavioral features A ⁇ X, represents the similarity between behavioral features A ⁇ Y. Further, between d x, d y each behavior characteristic A ⁇ X, represents the distance between the behavioral features A ⁇ Y.
  • the detailed configuration of the learning data generation unit 102 and the learning unit 103 has been described above for each machine learning method, but other machine learning methods may be used to independently configure the learning data generation unit 102 and the learning unit 103.
  • FIG. 9 is a block diagram illustrating the hardware configuration of the feature learning system 100.
  • the components of the feature learning system (similarity definition unit 101, learning data generation unit 102, learning unit 103) are realized by the information processing device 1000 (computer).
  • the information processing device 1000 includes a bus 1010, a processor 1020, a memory 1030, a storage device 1040, an input / output interface 1050, and a network interface 1060.
  • the bus 1010 is a data transmission path for the processor 1020, the memory 1030, the storage device 1040, the input / output interface 1050, and the network interface 1060 to transmit and receive data to and from each other.
  • the method of connecting the processors 1020 and the like to each other is not limited to the bus connection.
  • the processor 1020 is a processor realized by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like.
  • the memory 1030 is a main storage device realized by a RAM (Random Access Memory) or the like.
  • the storage device 1040 is an auxiliary storage device realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a memory card, a ROM (Read Only Memory), or the like.
  • the storage device 1040 stores a program module that realizes each function of the information processing device 1000 (similarity definition unit 101, learning data generation unit 102, learning unit 103, etc.).
  • the processor 1020 reads each of these program modules into the memory 1030 and executes them, each function corresponding to the program module is realized.
  • the input / output interface 1050 is an interface for connecting the information processing device 1000 and various input / output devices.
  • An input device such as a mouse, keyboard, or touch panel, or an output device such as a display may be connected to the input / output interface 1050.
  • the network interface 1060 is an interface for connecting the information processing device 1000 to the network.
  • This network is, for example, LAN (Local Area Network) or WAN (Wide Area Network).
  • the method of connecting the network interface 1060 to the network may be a wireless connection or a wired connection.
  • the hardware configuration of the information processing device 1000 is not limited to the configuration illustrated in FIG.
  • FIG. 10 is a flowchart showing a processing flow of the feature learning system 100 of the first embodiment.
  • the similarity definition unit 101 defines the similarity for the combination of behavioral feature classes and stores it in the similarity DB 112. (Step S101, hereinafter simply referred to as S101).
  • the learning data generation unit 102 arbitrarily selects and extracts a plurality of behavioral features from the feature DB 111 (S102). Then, the learning data generation unit 102 refers to the similarity DB 112 based on the combination of the classes related to the two extracted behavioral features, and acquires the similarity corresponding to the combination (S103). For example, when Siamese Network is used, the learning data generation unit 102 extracts two behavioral features from the feature DB 111. Then, the learning data generation unit 102 stores the similarity corresponding to the combination of the first class to which one of the two extracted behavioral features belongs and the second class to which the other belongs in the similarity DB 112. Get informed.
  • the learning data generation unit 102 corresponds the information "0.05" from the similarity DB 112 to the combination of those classes. It can be obtained as a degree of similarity.
  • the learning data generation unit 102 extracts the calculation formula for obtaining the similarity from the similarity DB 112. Then, the learning data generation unit 102 can acquire the similarity by substituting the numerical values of the above-mentioned two behavioral features into the extracted calculation formula. Then, the learning data generation unit 102 collects the plurality of behavioral features taken out in S102 and the similarity acquired in the process of S103 into one set, and stores them in the learning DB 113 as learning data (S104).
  • the learning data generation unit 102 confirms whether a sufficient number of behavioral feature and similarity sets (learning data) are stored in the learning DB (S105). For example, the learning data generation unit 102 determines whether or not a predetermined number or a predetermined number of learning data is stored in the learning DB 113. When a sufficient number of learning data is not stored in the learning DB 113 (NO in S105), the learning data generation unit 102 repeats the processes from S102 to S104. On the other hand, when a sufficient number of learning data is stored in the learning DB 113 (YES in S105), the learning data generation unit 102 ends the process of generating the learning data. In this case, the process proceeds to step S106.
  • the learning unit 103 extracts a required number of sets of similarity and behavioral features (learning data) from the learning DB 113, and performs machine learning in consideration of the similarity (S106). For example, when Siamese Network or Triplet Network is used, the learning unit 103 proceeds with learning so that the value of Loss defined by the equation (18) or the equation (20) including the similarity as a variable becomes small.
  • the feature learning system 100 makes it possible to learn the identification of a person's behavior while considering the similarity between the behaviors without changing the identification method from the conventional one. To do. As a result, stable learning can be performed while suppressing the adverse effects of learning "behavior that looks similar but is different". In other words, it is possible to construct a stable feature space that does not require excessive emphasis on behavioral differences, and the effect of improving discrimination performance can be expected with the same discrimination method as before. Also, at the time of learning, preprocessing by principal component analysis and prior learning / identification may be required when determining the similarity, but once the similarity is determined, the value will be used in subsequent learning. You can continue to use it, or you can take a method without pretreatment such as artificially determining the degree of similarity. Therefore, the labor required to prepare the learning data used for machine learning can be reduced as compared with the conventional technique.
  • FIG. 11 is a diagram illustrating the configuration of the feature learning system 100 of the second embodiment.
  • the feature learning system 100 of the present embodiment further includes a display processing unit 104.
  • the display processing unit 104 outputs a screen showing the processing result (determination result of similarity between behavioral features, etc.) of the learning data generation unit 102 to a display (not shown) provided for the person in charge of work.
  • FIG. 12 is a diagram showing an example of a screen output by the display processing unit 104.
  • the display processing unit 104 displays a screen including two behavioral features (behavioral feature A and behavioral feature B) arbitrarily selected and extracted from the feature DB 111, and information indicating their similarity. doing. With such a screen, a person who performs the work of generating the learning data can proceed with the work while checking the contents of the learning data.
  • the screen output by the display processing unit 104 is not limited to the example of FIG.
  • the display processing unit 104 may generate a screen including the two behavioral features superimposed and output the screen to a display provided on the person in charge of work.
  • the display processing unit 104 may adjust the transmittance of the image data of the two behavioral features so that the difference between the two behavioral features can be seen, for example.
  • the display processing unit 104 may be configured to change the display mode of each feature point based on the similarity of the corresponding feature points between the two behavioral features. For example, the display processing unit 104 may change the shape and color of the feature points having low (or high) similarity so that the feature points are displayed more emphasized than the other feature points.
  • the display processing unit 104 outputs a screen including further display elements so that the worker can select whether or not to store the learning data generated by the learning data generation unit 102 in the learning DB 113. It may be configured.
  • the display processing unit 104 outputs a screen further including information indicating the distribution of the learning data already stored in the learning DB 113 (for example, the distribution based on the similarity included in the learning data). It may be configured.
  • FIG. 13 shows another example of the screen output by the display processing unit 104.
  • FIG. 13 is a diagram showing another example of the screen output by the display processing unit 104.
  • the worker can easily grasp which part of the two behavioral features is similar (or dissimilar) depending on the display mode of the feature point. Can be done. Further, the person in charge of work can confirm the information on the screen such as the content of the learning data and the distribution of the learning data in the learning DB 113, select the necessary learning data, and store it in the learning DB 113. ..
  • the present invention can be applied to the identification of any feature that can be expressed by a vector.
  • a similarity definition means that defines the similarity between two classes corresponding to each of the two feature vectors
  • a learning data generation means that acquires the similarity based on a combination of classes to which each of the plurality of feature vectors acquired as a processing target belongs, and generates learning data including the plurality of feature vectors and the similarity.
  • the similarity defining means defines a calculation formula for obtaining the similarity between the two classes based on the two feature vectors.
  • the learning data generation means acquires a calculation formula for obtaining the similarity corresponding to the combination of classes to which each of the plurality of feature vectors acquired as the processing target belongs, and substitutes the plurality of feature vectors into the calculation formula.
  • the similarity 1. 1. Feature learning system described in. 3. 3. The similarity is calculated based on the norm of the difference between the feature vector or the vector obtained by reducing the dimension of the feature vector or the angle formed by the vectors. 2. Feature learning system described in. 4. The learning means uses distance learning. 1. 1. From 3. The feature learning system described in any one of the above. 5. The similarity is calculated based on the angle formed by the eigenvector corresponding to the first principal component obtained for each class by performing principal component analysis for each class to which the feature vector belongs. 1. 1.
  • the similarity is calculated based on the false recognition rate when the class is identified using the feature vector. 1. 1. From 4.
  • the feature vector is a feature of human behavior.
  • the class to which the feature vector belongs is the type of action to which the feature of the person behavior belongs. 1. 1. From 6.
  • the characteristics of human behavior include information on one or more of a visible light camera, an infrared camera, and a depth sensor. 7.
  • the characteristics of the person behavior include the person skeleton information.
  • the person skeleton information includes at least one or more position information of the head, neck, left elbow, right elbow, left hand, right hand, lumbar region, left knee, right knee, left foot, and right foot.
  • the similarity is calculated based on the distance between the corresponding parts of the human skeleton information or the angle formed by the line segments connecting the parts.
  • the computer Define the similarity between the two classes corresponding to each of the two feature vectors, The similarity is acquired based on the combination of classes to which each of the plurality of feature vectors acquired as the processing target belongs. A learning data including the plurality of feature vectors and the similarity is generated, and the learning data is generated.
  • a calculation formula for calculating the similarity between the two classes is defined based on the two feature vectors.
  • a calculation formula for obtaining the similarity corresponding to the combination of classes to which each of the plurality of feature vectors acquired as the processing target belongs is acquired, and the plurality of feature vectors are substituted into the calculation formula to calculate the similarity.
  • the similarity is calculated based on the norm of the difference between the feature vector or the vector obtained by reducing the dimension of the feature vector or the angle formed by the vectors.
  • 12. Feature learning method described in. 14 The computer uses distance learning as the machine learning. Including 11. To 13. The feature learning method described in any one of the above. 15.
  • the similarity is calculated based on the angle formed by the eigenvector corresponding to the first principal component obtained for each class by performing principal component analysis for each class to which the feature vector belongs. 11. From 14. The feature learning method described in any one of the above. 16. The similarity is calculated based on the false recognition rate when the class is identified using the feature vector. 11. From 14. The feature learning method described in any one of the above. 17.
  • the feature vector is a feature of human behavior.
  • the class to which the feature vector belongs is the type of action to which the feature of the person behavior belongs. 11. From 16.
  • the characteristics of human behavior include information on one or more of a visible light camera, an infrared camera, and a depth sensor. 17. Feature learning method described in. 19.
  • the characteristics of the person behavior include the person skeleton information.
  • the person skeleton information includes at least one or more position information of the head, neck, left elbow, right elbow, left hand, right hand, lumbar region, left knee, right knee, left foot, and right foot. 17.
  • Feature learning method described in. 20 The similarity is calculated based on the distance between the corresponding parts of the human skeleton information or the angle formed by the line segments connecting the parts. 19.
  • Feature learning method described in. 21 On the computer, 11. From 20. A program that executes the feature learning method described in any one of the above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
PCT/JP2019/050642 2019-12-24 2019-12-24 特徴学習システム、特徴学習方法およびプログラム Ceased WO2021130864A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US17/785,554 US20230012026A1 (en) 2019-12-24 2019-12-24 Feature learning system, feature learning method, and non-transitory computer readable medium
JP2021566607A JP7367775B2 (ja) 2019-12-24 2019-12-24 特徴学習システム、特徴学習方法およびプログラム
PCT/JP2019/050642 WO2021130864A1 (ja) 2019-12-24 2019-12-24 特徴学習システム、特徴学習方法およびプログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2019/050642 WO2021130864A1 (ja) 2019-12-24 2019-12-24 特徴学習システム、特徴学習方法およびプログラム

Publications (1)

Publication Number Publication Date
WO2021130864A1 true WO2021130864A1 (ja) 2021-07-01

Family

ID=76575800

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/050642 Ceased WO2021130864A1 (ja) 2019-12-24 2019-12-24 特徴学習システム、特徴学習方法およびプログラム

Country Status (3)

Country Link
US (1) US20230012026A1 (https=)
JP (1) JP7367775B2 (https=)
WO (1) WO2021130864A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023103763A (ja) * 2022-01-14 2023-07-27 株式会社日立製作所 Ai学習データ作成支援システム、ai学習データ作成支援方法およびai学習データ作成支援プログラム

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011194073A (ja) * 2010-03-19 2011-10-06 Konami Digital Entertainment Co Ltd ゲーム装置、ゲーム装置の制御方法、及びプログラム
JP2012174222A (ja) * 2011-02-24 2012-09-10 Olympus Corp 画像認識プログラム、方法及び装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5992276B2 (ja) * 2012-09-20 2016-09-14 株式会社東芝 人物認識装置、及び方法
CN105589914B (zh) * 2015-07-20 2018-07-06 广州市动景计算机科技有限公司 一种网页页面的预读取方法、装置及智能终端设备
US10769255B2 (en) * 2015-11-11 2020-09-08 Samsung Electronics Co., Ltd. Methods and apparatuses for adaptively updating enrollment database for user authentication
US11854308B1 (en) * 2016-02-17 2023-12-26 Ultrahaptics IP Two Limited Hand initialization for machine learning based gesture recognition
CN111144217B (zh) * 2019-11-28 2022-07-01 重庆邮电大学 一种基于人体三维关节点检测的动作评价方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011194073A (ja) * 2010-03-19 2011-10-06 Konami Digital Entertainment Co Ltd ゲーム装置、ゲーム装置の制御方法、及びプログラム
JP2012174222A (ja) * 2011-02-24 2012-09-10 Olympus Corp 画像認識プログラム、方法及び装置

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
16 August 2017, First edition. Tokyo: Corona Publishing Co., Ltd., ISBN: 978-4-339- 02812-6, article KIYOSHI ET AL.: " Multi-Agent Data Analysis", pages: 27 - 39 *
HADSELL R., CHOPRA S., LECUN Y.: "Dimensionality Reduction by Learning an Invariant Mapping", CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2006 IEEE COMPUTER SOCIETY , NEW YORK, NY, USA 17-22 JUNE 2006, IEEE, PISCATAWAY, NJ, USA, vol. 2, 17 June 2006 (2006-06-17) - 22 June 2006 (2006-06-22), Piscataway, NJ, USA , pages 1735 - 1742, XP010922992, ISBN: 978-0-7695-2597-6, DOI: 10.1109/CVPR.2006.100 *
JIANG WANG, YANG SONG, THOMAS LEUNG, CHUCK ROSENBERG, JINGBIN WANG, JAMES PHILBIN, BO CHEN, YING WU: "Learning Fine-Grained Image Similarity with Deep Ranking", 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, IEEE, 17 April 2014 (2014-04-17) - 28 June 2014 (2014-06-28), pages 1386 - 1393, XP055263324, ISBN: 978-1-4799-5118-5, DOI: 10.1109/CVPR.2014.180 *
SHIRAISHI, TATSUYA: "2-Q-13 Corpus based Visual Speech Synthesis using Perceptual Definition of Viseme", PROCEEDINGS OF THE SPRING MEETING OF THE ACOUSTICAL SOCIETY OF JAPAN; TOKYO, JAPAN; 18-20 MARCH 2003, vol. 1, 18 March 2003 (2003-03-18) - 20 March 2003 (2003-03-20), pages 399 - 400, XP009533838 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023103763A (ja) * 2022-01-14 2023-07-27 株式会社日立製作所 Ai学習データ作成支援システム、ai学習データ作成支援方法およびai学習データ作成支援プログラム

Also Published As

Publication number Publication date
JPWO2021130864A1 (https=) 2021-07-01
JP7367775B2 (ja) 2023-10-24
US20230012026A1 (en) 2023-01-12

Similar Documents

Publication Publication Date Title
Zhang et al. Ergonomic posture recognition using 3D view-invariant features from single ordinary camera
Abbas et al. Drone-based human action recognition for surveillance: a multi-feature approach
Ansar et al. Robust hand gesture tracking and recognition for healthcare via recurrent neural network
CN106682598B (zh) 一种基于级联回归的多姿态的人脸特征点检测方法
KR101588254B1 (ko) 3차원 근거리 상호작용
Amrutha et al. Human body pose estimation and applications
EP2892007A2 (en) Static posture based person identification
JP2019096113A (ja) キーポイントデータに関する加工装置、方法及びプログラム
Batool et al. Telemonitoring of daily activities based on multi-sensors data fusion
Xu et al. A novel method for hand posture recognition based on depth information descriptor.
JP7708182B2 (ja) 画像処理装置、画像処理方法、およびプログラム
JP7367775B2 (ja) 特徴学習システム、特徴学習方法およびプログラム
JP7658380B2 (ja) 画像選択装置、画像選択方法、及びプログラム
JP7435781B2 (ja) 画像選択装置、画像選択方法、及びプログラム
US12530795B2 (en) Image selection apparatus, image selection method, and non-transitory computer-readable medium
JP7435754B2 (ja) 画像選択装置、画像選択方法、及びプログラム
US20250005073A1 (en) Image processing apparatus, and image processing method
CN113822122A (zh) 具有低空间抖动、低延迟和低功耗的对象和关键点检测系统
JP6992900B2 (ja) 情報処理装置、制御方法、及びプログラム
CN118747913A (zh) 一种基于图网络模型的动态手势识别方法
JP7589744B2 (ja) 画像選択装置、画像選択方法、及びプログラム
JP7375921B2 (ja) 画像分類装置、画像分類方法、およびプログラム
JP7501621B2 (ja) 画像選択装置、画像選択方法、およびプログラム
JP2022019988A (ja) 情報処理装置、ディスプレイ装置、及び制御方法
JP6814374B2 (ja) 検出方法、検出プログラム及び検出装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19957317

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021566607

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19957317

Country of ref document: EP

Kind code of ref document: A1