WO2021192085A1

WO2021192085A1 - Pose identifying apparatus, pose identifying method, and non-transitory computer readable medium storing program

Info

Publication number: WO2021192085A1
Application number: PCT/JP2020/013306
Authority: WO
Inventors: Yadong Pan
Original assignee: Nec Corporation
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2021-09-30
Also published as: JP2023512318A; JP7323079B2

Abstract

A grouping unit (12) of a pose identifying apparatus (10) counts, for each basic pattern, the number of links in which a corresponding detected mid-point is present in a mid-point expected area obtained from links between a plurality of grouping evaluation reference body region points and a grouping target detected body region point. The basic pattern includes a plurality of detected body region points corresponding to a plurality of base body region types that are different from each other. The plurality of grouping evaluation reference body region points are composed of some or all of a plurality of detected body region points included in the extracted basic pattern. The grouping unit (12) groups the grouping target detected body region point into one of a plurality of person groups respectively corresponding to a plurality of the extracted basic patterns based on a count value.

Description

POSE IDENTIFYING APPARATUS, POSE IDENTIFYING METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM STORING PROGRAM

　　The present disclosure relates to a pose identifying apparatus, a pose identifying method, and a non-transitory computer readable medium storing a program.

　　A technique of identifying a pose of each person in an image including a plurality of person images respectively corresponding to a plurality of humans has been proposed (e.g., Patent Literature 1). The technique disclosed in Patent Literature 1 detects a plurality of body region points for a human in the image, and identifies the human's head from among the plurality of detected body region points so as to identify the human in the image. Then, the pose of the human is identified by associating the detected human body region point with another detected body region point.

PTL 1: US Patent Application Publication No. 2018/0293753

　　However, in the technique disclosed in Patent Literature 1, the person in the image is identified based only on his/her head. For this reason, for example, when the resolution of the image is low, the accuracy of the identification may decrease.

　　The inventor has found that an accuracy of identifying a person's pose can be improved by extracting a basic pattern including three or more base body region points as a human "core part".

　　Further, the present inventor has found that the speed of identifying a person's pose can be increased by using a detected body region point not adjacent to a base body region point directly as a grouping target detected body region point.

An object of the present disclosure is to provide a pose identifying apparatus, a pose identifying method, and a non-transitory computer-readable medium for storing a program, which can improve an accuracy of identifying a person's pose and increase the speed of identifying a person's pose.

　　A first example aspect is a pose identifying apparatus including: basic pattern extracting means for extracting a basic pattern for each human from a plurality of detected body region points and a plurality of detected mid-points, which are detected, in an image including a plurality of person images respectively corresponding to a plurality of humans, for a plurality of predetermined detection target points for a human , wherein the predetermined detection target points include a plurality of body region points of the human and a mid-point of each body region point pair composed of two body region points, and wherein the basic pattern includes a plurality of detected base body region points corresponding to a plurality of base body region types that are different from each other; and
　　grouping means for counting, for each extracted basic pattern, the number of links in which a corresponding detected mid-point is present in a mid-point expected area obtained from links between a plurality of grouping evaluation reference body region points and a grouping target detected body region point, wherein the grouping evaluation reference body region points are composed of some or all of a plurality of detected base body region points included in the extracted basic pattern, and then grouping the grouping target detected body region point into one of a plurality of person groups respectively corresponding to a plurality of the extracted basic patterns based on count values counted for the extracted basic patterns.

　　A second example aspect is a pose identifying method including:
　　extracting a basic pattern for each human from a plurality of detected body region points and a plurality of detected mid-points, which are detected, in an image including a plurality of person images respectively corresponding to a plurality of humans, for a plurality of predetermined detection target points for a human, wherein the predetermined detection target points include a plurality of body region points of the human and a mid-point of each body region point pair composed of two body region points, and wherein the basic pattern includes a plurality of detected base body region points corresponding to a plurality of base body region types that are different from each other; and
　　counting, for each extracted basic pattern, the number of links in which a corresponding detected mid-point is present in a mid-point expected area obtained from links between a plurality of grouping evaluation reference body region points and a grouping target detected body region point, wherein the grouping evaluation reference body region points are composed of some or all of a plurality of detected base body region points included in the extracted basic pattern, and then grouping the grouping target detected body region point into one of a plurality of person groups respectively corresponding to a plurality of the extracted basic patterns based on count values for the extracted basic patterns.

　　A third example aspect is a non-transitory computer readable medium storing a program for causing a pose identifying apparatus to execute:
　　extracting a basic pattern for each human from a plurality of detected body region points and a plurality of detected mid-points, which are detected, in an image including a plurality of person images respectively corresponding to a plurality of humans, for a plurality of predetermined detection target points for a human, wherein the predetermined detection target points include a plurality of body region points of the human and a mid-point of each body region point pair composed of two body region points, and wherein the basic pattern includes a plurality of detected base body region points corresponding to a plurality of base body region types that are different from each other; and
　　counting, for each extracted basic pattern, the number of links in which a corresponding detected mid-point is present in a mid-point expected area obtained from links between a plurality of grouping evaluation reference body region points and a grouping target detected body region point, wherein the grouping evaluation reference body region points are composed of some or all of a plurality of detected base body region points included in the extracted basic pattern, and then grouping the grouping target detected body region point into one of a plurality of person groups respectively corresponding to a plurality of the extracted basic patterns based on a count values for the extracted basic patterns.

　　According to the present disclosure, it is possible to provide a pose identifying apparatus, a pose identifying method, and a non-transitory computer-readable medium for storing a program, which can improve an accuracy of identifying a person's pose and increase the speed of identifying a person's pose.

Fig. 1 is a diagram showing an example of a pose identifying apparatus according to a first example embodiment. Fig. 2 is a flowchart showing an example of a processing operation of a pose identifying apparatus according to the first example embodiment. Fig. 3 is a block diagram showing an example of a pose identifying apparatus according to a second example embodiment. Fig. 4 is a diagram showing an example of a plurality of predetermined detection target points for a human. Fig. 5 is a diagram for describing an extraction process of a basic pattern. Fig. 6 is a diagram for describing types of basic pattern candidates. Fig. 7 is a diagram for describing calculation of a base length. Fig. 8 is a diagram for describing grouping process. Fig. 9 is a diagram for describing a mid-point expected area. Fig. 10 is a diagram for describing grouping process. Fig. 11 is a diagram showing an example of a hardware configuration of the pose identifying apparatus.

　　Hereinafter, example embodiments will be described with reference to the drawings. In the example embodiments, the same or equivalent elements will be denoted by the same reference signs, and repeated descriptions will be omitted.

First example embodiment
<Configuration example of pose identifying apparatus>
　　Fig. 1 is a diagram showing an example of a pose identifying apparatus according to a first example embodiment. In Fig. 1, a pose identifying apparatus 10 includes a basic pattern extracting unit 11 and a grouping unit 12.

　　The basic pattern extracting unit 11 acquires information about a "position in an image" and a "point type" of each of a plurality of "detected body region points" and a plurality of "detected mid-points". The basic pattern extracting unit 11 extracts a "basic pattern" for each human from the plurality of "detected body region points" and the plurality of "detected mid-points". The plurality of "detected body region points" and the plurality of "detected mid-points" are a plurality of predetermined "detection target points" for a human in an image including a plurality of person images respectively corresponding to a plurality of humans. The plurality of "detected body region points" and the plurality of "detected mid-points" are detected by, for example, a neural network (not shown) for the plurality of predetermined detection target points including the plurality of "body region points" of the human and "mid-points" for respective "body region point pairs" each composed of two body region points. The "basic pattern" includes a plurality of "detected body region points (i.e., detected base body region points)" corresponding to a plurality of "base body region types" that are different from each other.

　　Here, each "body region point" included in the plurality of the predetermined "detection target points" relates to a human body region such as a neck, an eye, a nose, an ear, a shoulder, and an elbow of a human. The "mid-point" included in the plurality of the predetermined "detection target points" relates to a human body region in the case of a body region point pair composed of body region points directly connected by, for example, an arm, such as a right shoulder and a right elbow (i.e., body region point pair composed of body region points adjacent to each other). On the other hand, when the body region point pair is composed of body region points not directly connected to each other such as a right shoulder and a left elbow, the "mid-point" included in the plurality of predetermined "detection target points" may be a spatial point around the human according to the person's pose at that moment. The "detected body region point" and the "detected mid-point" are points detected by, for example, a neural network (not shown) for the "body region point" and the "mid-point", respectively, included in the plurality of predetermined "detection target points".

For example, the "basic pattern" includes at least one of the following two combinations. A first combination is a combination of three base body region points, which correspond to a neck, a left shoulder and a left ear each being a base body region type. A second combination is a combination of three base body region points, which correspond to a neck, a right shoulder and a right ear each being a base body region type. That is, the "basic pattern" corresponds to a core part that is most stably detectable in a human body in images.

　　The grouping unit 12 counts, for each basic pattern, the number of links in which a corresponding detected mid-point is present in a "mid-point expected area" obtained from links between each of a "plurality of grouping evaluation reference body region points" and a "grouping target detected body region point". The "plurality of grouping evaluation reference body region points" are composed of some or all of the plurality of detected base body region points included in the extracted basic pattern.

For example, the "plurality of grouping evaluation reference body region points" may be detected body region points each corresponding to one of the neck, left shoulder and right shoulder each being the base body region type. At this time, in the case of the basic pattern that does not include the detected body region point of the left shoulder, the two detected body region points corresponding to the neck and the right shoulder are the above-mentioned "plurality of grouping evaluation reference body region points". The "grouping target detected body region point" is each detected body region point for a body region type not included in the basic pattern. For example, when the body region types of the basic pattern are the neck, the right shoulder, the left shoulder, the right ear, and the left ear, the detected body region points for the eye, the nose, the elbow, etc. are grouping target detected body region points. The "mid-point expected area" is an area (middle area) including a "defined mid-point" defined as a "center point" of the above link.

　　The grouping unit 12 groups the grouping target detected body region point into one of a plurality of person groups respectively corresponding to the plurality of extracted basic patterns based on count values each of which is counted for the corresponding extracted basic pattern.

<Operation example of pose identifying apparatus>
An example of a processing operation of the pose identifying apparatus 10 having the above configuration will be described. Fig. 2 is a flowchart showing an example of the processing operation of the pose identifying apparatus according to the first example embodiment.

　　The basic pattern extracting unit 11 extracts a basic pattern for each human from a plurality of detected body region points and a plurality of detected mid-points (Step S101).

　　The grouping unit 12 counts, for each basic pattern, the number of links in which a corresponding detected mid-point is present in the mid-point expected area obtained from links between each of the plurality of grouping evaluation reference body region points and the grouping target detected body region point (Step S102).

　　The grouping unit 12 groups the grouping target detected body region point into one of a plurality of person groups respectively corresponding to the plurality of extracted basic patterns based on count values each of which is counted for the corresponding extracted basic pattern (Step S103).

　　Note that the processing of Steps S102 and S103 (i.e., grouping process) is performed for each grouping target detected body region point. The processing of Steps S102 and S103 for a plurality of grouping target detected body region points may be performed in order or in parallel. By performing the processing of Steps S102 and S103 for the plurality of grouping target detected body region points in parallel, it is possible to increase the speed of identifying a human.

　　As described above, according to the first example embodiment, the basic pattern extracting unit 11 in the pose identifying apparatus 10 extracts the basic pattern for each human from the plurality of detected body region points and the plurality of "detected mid-points". The "basic pattern" includes the plurality of detected base body region points corresponding to the plurality of base body region types that are different from each other.

　　According to such a configuration of the pose identifying apparatus 10, the above-described basic pattern including a plurality of detected base body region points can be extracted as a "core part" of a human. By doing so, the accuracy of identifying the person's pose included in the image can be improved.

　　In the pose identifying apparatus 10, the grouping unit 12 counts, for each basic pattern, the number of links in which a corresponding detected mid-point is present in the mid-point expected area obtained from links between each of the plurality of grouping evaluation reference body region points and the grouping target detected body region point. The plurality of grouping evaluation reference body region points are composed of some or all of the plurality of detected base body region points included in the extracted basic pattern. Then, the grouping unit 12 groups the grouping target detected body region point into one of a plurality of person groups respectively corresponding to the plurality of extracted basic patterns based on count values each of which is counted for the corresponding extracted basic pattern.

　　With such a configuration of the pose identifying apparatus 10, a result of the grouping on a first grouping target detected body region point does not affect a result of the grouping on a second grouping target detected body region point adjacent to the first grouping target detected body region point. It is thus possible to execute the grouping process on the grouping target detected body region points regardless of whether a grouping evaluation reference body region point is adjacent to the grouping target detected body region point. For this reason, the grouping process for a plurality of grouping target detected body region points can be executed in parallel, so that the speed of identifying a person's pose can be increased.

Second example embodiment
The second example embodiment relates to a more specific example embodiment.

<Configuration example of pose identifying apparatus>
Fig. 3 is a block diagram showing an example of a pose identifying apparatus according to the second example embodiment. In Fig. 3, the pose identifying apparatus 20 includes a basic pattern extracting unit 21 and a grouping unit 22.

　　Like the basic pattern extracting unit 11 according to the first example embodiment, the basic pattern extracting unit 21 acquires information about a "position in an image" and a "point type" of each of a plurality of "detected body region points" and a plurality of "detected mid-points". The basic pattern extracting unit 21 extracts a "basic pattern" for each human from the plurality of "detected body region points" and the plurality of "detected mid-points". The plurality of "detected body region points" and the plurality of "detected mid-points" are a plurality of predetermined "detection target points" for a human in an image including a plurality of person images respectively corresponding to a plurality of humans. The plurality of "detected body region points" and the plurality of "detected mid-points" are detected by, for example, a neural network (not shown) for the plurality of predetermined detection target including the plurality of "body region points" of the human and "mid-points" for respective "body region point pairs" each composed of two body region points. The "basic pattern" includes a plurality of "detected body region points (i.e., detected base body region points)" corresponding to a plurality of "base body region types" that are different from each other.

　　Fig. 4 is a diagram showing an example of the plurality of predetermined detection target points for a human. In Fig. 4, the "plurality of predetermined detection target points" for a human include body region points N₀ to N₁₇. As shown in Fig. 4, the body region point N₀ corresponds to a neck. The body region point N₁ corresponds to a right shoulder. The body region point N₂ corresponds to a left shoulder. The body region point N₃ corresponds to a right ear. The body region point N₄ corresponds to a left ear. The body region point N₅ corresponds to a nose. The body region point N₆ corresponds to a right eye. The body region point N₇ corresponds to a left eye. The body region point N₈ corresponds to a right elbow. The body region point N₉ corresponds to a right wrist. The body region point N₁₀ corresponds to a left elbow. The body region point N₁₁ corresponds to a left wrist. The body region point N₁₂ corresponds to a right hip. The body region point N₁₃ corresponds to a left hip. The body region point N₁₄ corresponds to a right knee. The body region point N₁₅ corresponds to a left knee. The body region point N₁₆ corresponds to a right ankle. The body region point N₁₇ corresponds to a left ankle.

　　When the "plurality of grouping evaluation reference body region points" described in the first example embodiment are the body region points N₀, N₁, and N₂, the "plurality of predetermined detection target points" for a human include 39 mid-points corresponding to the respective combinations of the body region points N₀, N₁, and N₂ and the body region points N₅ to N₁₇. A mid-point between a body region point N_i and a body region point N_j is represented by a mid-point M_{i_j}. Like the mid-point, the detected mid-point is represented by a detected mid-point M_{i_j}, and the "defined mid-point" described in the first example embodiment is represented by a detected mid-point M'_{i_j}.

　　Thus, when the image includes human full body images of five persons, the basic pattern extracting unit 21 may acquire five sets of information each including the positions and the point types of the detected body region points N₀ to N₁₇ and 39 detected mid-points M.

　　Returning to the description of Fig. 3, like the basic pattern extracting unit 11 according to the first example embodiment, the basic pattern extracting unit 21 extracts a "basic pattern" for each human from the plurality of "detected body region points" and the plurality of "detected mid-points".

　　For example, as shown in Fig. 3, the basic pattern extracting unit 21 includes a basic pattern candidate identifying unit 21A, a base length calculating unit 21B, and a basic pattern forming unit 21C.

　　The basic pattern candidate identifying unit 21A identifies a plurality of "basic pattern candidates" by classifying, into the same basic pattern candidate, each combination which includes detected body region points that are close in distance to each other in the image from among a plurality of combinations of the plurality of detected base body region points corresponding to the "main type" and the plurality of detected body region points corresponding to the "sub types". The "main type" is, for example, the neck, and the "sub types" are the right shoulder, the left shoulder, the right ear, and the left ear. For example, the basic pattern candidate identifying unit 21A selects, for one detected body region point corresponding to the neck, one detected body region point corresponding to the right shoulder that is closest in distance to the one detected body region point corresponding to the neck from among the plurality of detected body region points corresponding to the right shoulder. This selection is made for each detected body region point corresponding to the neck. Then, when one detected body region point corresponding to the right shoulder is selected for the plurality of detected body region points corresponding to the neck, the basic pattern candidate identifying unit 21A selects one detected body region point corresponding to the neck that is closest in distance to the above-mentioned detected body region point corresponding to the right shoulder from among the plurality of detected body region points corresponding to the neck. That is, the basic pattern candidate identifying unit 21A performs processing using the MLMD (Mutual-Local-Minimum-Distance) algorithm. Thus, one detected body region point corresponding to the neck and one detected body region point corresponding to the right shoulder are selected, and these detected body region points are classified into the same "basic pattern candidate". The processing described above is performed for each of the left shoulder, the right ear, and the left ear.

　　The basic pattern forming unit 21C performs "optimization processing" on a plurality of basic pattern candidates identified by the basic pattern candidate identifying unit 21A to thereby form a plurality of basic patterns for the plurality of humans.

　　The "optimization processing" includes the following processes. A first process is a process of cutting one basic pattern candidate including the plurality of detected body region points corresponding to the main type to convert the one basic pattern candidate into a plurality of the basic pattern candidates each including one detection point corresponding to the main type. That is, when one basic pattern candidate includes a plurality of detected body region points corresponding to the neck, the one basic pattern candidate is converted into a plurality of basic pattern candidates each including one detected body region point corresponding to the neck.

　　A second process is a process of excluding, from each basic pattern candidate, a detected body region point(s) that is included in the basic pattern candidate, that corresponds to the sub type, and whose distance from the detected body region point corresponding to the main type is longer than a "base length for the basic pattern candidate".

　　A third process is a process of excluding a basic pattern candidate(s) not including any of a combination of three detected body region points of a "first body region type group" and a combination of three detected body region points of a "second body region type group". For example, the "first body region type group" includes the neck, the left shoulder, and the left ear, and the "second body region type group" includes the neck, the right shoulder, and the right ear.

　　The base length calculating unit 21B calculates the "base length for each basic pattern candidate" when the above-described first process is completed. The calculation of the "base length for each basic pattern candidate" will be described in detail later.

　　Like the grouping unit 12 according to the first example embodiment, the grouping unit 22 counts, for each basic pattern, the number of links in which a corresponding detected mid-point is present in a "mid-point expected area" obtained from links between each of a "plurality of grouping evaluation reference body region points" and a "grouping target detected body region point".

Then, the grouping unit 22 groups the grouping target detected body region point into one of a plurality of person groups respectively corresponding to the plurality of extracted basic patterns based on count values each of which is counted for the corresponding extracted basic pattern. For example, the grouping unit 22 may group the grouping target detected body region point into a person group corresponding to the basic pattern having the largest count value from among the plurality of extracted basic patterns. Alternatively, the grouping unit 22 may group the grouping target detected body region point into a person group corresponding to the basic pattern having the largest count value which is a predetermined value or greater (e.g., 2 or greater) from among the plurality of extracted basic patterns. Further, when there are a plurality of basic patterns having the largest count value, the grouping unit 22 may group the grouping target detected body region point into a basic pattern including the detected base body region point corresponding to the main type having the smallest distance from the grouping target detected body region point. This grouping process will be described in detail later.

<Operation example of pose identifying apparatus>
An example of the processing operation of the pose identifying apparatus 20 having the above configuration will be described.

<Basic pattern extraction process>
The basic pattern extracting unit 21 acquires information about a "position in an image" and a "point type" of each of a plurality of "detected body region points" and a plurality of "detected mid-points". The basic pattern extracting unit 21 extracts a "basic pattern" for each human from the plurality of "detected body region points" and the plurality of "detected mid-points".

Fig. 5 is a diagram for describing the basic pattern extraction process.

　　First, the basic pattern extraction process starts from a graph G shown in Fig. 5. The graph G includes all detected base body region point pairs corresponding to a group Sc of a "base body region type pair". The group Sc includes, as group elements, a pair of the neck and right shoulder, a pair of the neck and left shoulder, a pair of the neck and right ear, a pair of the neck and left ear, a pair of the right shoulder and right ear, and a pair of the left shoulder and left ear.

　　Next, the basic pattern candidate identifying unit 21A performs the processing using the MLMD algorithm on each base body region type pair to obtain a graph G-sub, and identifies, as the "basic pattern candidate", a block including a triangle(s) having the respective detected body region points corresponding to the neck in the graph G-sub as vertexes.

　　Fig. 6 is a diagram for describing types of the basic pattern candidates. As shown in Fig. 6, there may be five types of the basic pattern candidates, which are TA, TB, TC, TD, and TE. In Fig. 5, these five types of the basic pattern candidates are collectively referred to as "PATTERN-α". For example, the basic pattern candidate corresponding to the person facing the front is likely to be the type TA. There are basic pattern candidates of the types TB, TC, and TD due to a complex environment such as occlusions.

　　Then, the basic pattern forming unit 21C performs the optimization processing on "PATTERN-α" to form the plurality of basic patterns.

　　In the optimization processing, first, since the basic pattern candidate of the type TE shown in Fig. 6 includes two detected body region points corresponding to the neck, the basic pattern forming unit 21C divides the basic pattern candidate of the type TE into two basic pattern candidates each including one detected body region point corresponding to the neck (the above-described first process). Then, the basic pattern candidate of the type TB and the basic pattern candidate of the type TC are obtained. As a result, basic pattern candidates corresponding to the types TA, TB, TC, and TD remain.

　　Next, the basic length calculating unit 21B calculates the "base length" for each basic pattern candidate corresponding to any one of the types TA, TB, TC, and TD. The "base length" is a length that is a reference of a size of a human body.

　　Fig. 7 is a diagram for describing the calculation of the base length. First, the base length calculating unit 21B calculates lengths La, Lb, and Lc for each basic pattern candidate.

　　As shown in Fig. 7, the length La is calculated as a distance between the detected body region point N₀ corresponding to the neck and the detected body region point N₁ corresponding to the right shoulder in the basic pattern candidate or a distance between the detected body region point N₀ corresponding to the neck and the detected body region point N₂ corresponding to the left shoulder in the basic pattern candidate. Specifically, when the basic pattern candidate includes both the detected body region point N₁ corresponding to the right shoulder and the detected body region point N₂ corresponding to the left shoulder, the length La is equal to the smaller one of the distance between the detected body region point N₀ corresponding to the neck and the detected body region point N₁ corresponding to the right shoulder and the distance between the detected body region point N₀ corresponding to the neck and the detected body region point N₂ corresponding to the left shoulder. When the basic pattern candidate includes the detected body region point N₁ corresponding to the right shoulder but does not include the detected body region point N₂ corresponding to the left shoulder, the length La is the distance between the detected body region point N₀ corresponding to the neck and the detected body region point N₁ corresponding to the right shoulder. When the basic pattern candidate includes the detected body region point N₂ corresponding to the left shoulder but does not include the detected body region point N₁ corresponding to the right shoulder, the length La is the distance between the detected body region point N₀ corresponding to the neck and the detected body region point N₂ corresponding to the left shoulder.

　　As shown in Fig. 7, the length Lb is calculated as a distance between the detected body region point N₀ corresponding to the neck and the detected body region point N₃ corresponding to the right ear in the basic pattern candidate or a distance between the detected body region point N₀ corresponding to the neck and the detected body region point N₄ corresponding to the left ear in the basic pattern candidate.

　　As shown in Fig. 7, the length Lc is calculated as, when there are detected mid-points M_{12_1} and M_{13_2} corresponding to the chest, a distance between the detected body region point N₀ corresponding to the neck in the basic pattern candidate and the detected mid-point M_12-1 that corresponds to the right chest and that is closest to the detected body region point N₀ or a distance between the detected body region point N₀ corresponding to the neck in the basic pattern candidate and the detected mid-point M_13-2 that corresponds to the left chest and that is closest to the detected body region point N₀. Further, when there is no detected mid-points M_{12_1} and M_{13_2} corresponding to the chest, the length Lc is calculated as Lc=La+Lb+1.

　　Next, the base length calculating unit 21B calculates the base length of each basic pattern candidate based on the calculated lengths La, Lb, and Lc. The base length calculating unit 21B calculates the base length by different calculation methods according to a large/small relation between "Lc" and "La+Lb" and a large/small relation between "Lb" and "La×2". As shown in Fig. 7, for example, when "Lc" is "La+Lb" or less and "Lb" is "La×2" or less, the base length is "Lc". When "Lc" is "La+Lb" or less and "Lb" is larger than "La×2", the base length is "Lc×1.17". When "Lc" is larger than "La+Lb" and "Lb" is "La×2" or less, the base length is "La+Lb". When "Lc" is larger than "La+Lb" and "Lb" is larger than "La×2", the base length is "Lb×1.7". In this example, "Lb" tends to be larger than "La×2" for the basic pattern candidates corresponding to a person facing sideways. Further, "Lb" tends to be "La×2" or less for basic pattern candidates corresponding to a person facing forward or backward. There are cases in which, in the basic pattern candidates corresponding to a person shown at a lower part of the image, his/her chest may not be shown in the image. In this case, "Lc" tends to be larger than "La+Lb".

　　Returning to the description of Fig. 5, the basic pattern forming unit 21C excludes, from each basic pattern candidate, a detected base body region point(s) that is included in the basic pattern candidate, that corresponds to the sub type, and whose distance from the detected base body region point corresponding to the main type is longer than the "base length for the basic pattern candidate" (the above second process). Then, for example, in the basic pattern candidate of the type TA including two triangles shown in Fig. 6, when the detected body region point corresponding to the ear included in one of the triangles is far from the detected body region point corresponding to the neck, this detected body region point corresponding to the ear is excluded from the basic pattern candidate. Thus, the basic pattern candidate of the type TA is changed to the basic pattern candidate of the type TC. Further, for example, in the basic pattern candidate of the type TA including two triangles, when the detected body region point corresponding to the shoulder included in one of the triangles is far from the detected body region point corresponding to the neck, this detected body region point corresponding to the shoulder is excluded from the basic pattern candidate. Thus, the basic pattern candidate of the type TA is changed to the basic pattern candidate of the type TB. The basic pattern candidate not including any triangle may appear as a result of the processing by this basic pattern forming unit 21C.

　　The basic pattern forming unit 21C excludes the basic pattern candidate(s) not including any of the combination of the three detected base body region points of the "first body region type group" and the combination of the three detected base body region points of the "second body region type group (the above-described third process). The "first body region type group" includes the neck, the left shoulder, and the left ear, and the "second body region type group" includes the neck, the right shoulder, and the right ear. That is, the basic pattern candidate(s) not including any of the above triangles is excluded by the processing of the basic pattern forming unit 21C. At this stage, as shown in Fig. 5, four types of the basic pattern candidates, i.e., the types TA, TB, TC, and TD, may remain. These remaining basic pattern candidates are the "basic patterns".

<Grouping process>
The grouping unit 22 counts, for each basic pattern, the number of links in which a corresponding detected mid-point is present in a "mid-point expected area" obtained from links between each of a "plurality of grouping evaluation reference body region points" and a "grouping target detected body region point".

　　Fig. 8 is a diagram for describing grouping process. In the example of Fig. 8, the

basic patterns

1, 2, and 3 are extracted. For example, as shown in Fig. 8, the grouping unit 22 connects the grouping target detected body region point N_i to each of the grouping evaluation reference body region points N₀, N₁, N₂ by a "temporary link". Here, the grouping evaluation reference body region points are detected base body region points corresponding to the neck, the right shoulder, and the left shoulder, respectively.

　　Next, the grouping unit 22 calculates the "mid-point expected area" for each temporary link. Fig. 9 is a diagram for describing the mid-point expected area. As shown in Fig. 9, the mid-point expected area corresponding to the temporary link is an oblong area centered on a center point M_{i_j}' between two detected body region points N_i and N_j of the temporary link (i.e., defined mid-point M_{i_j}'). In the example shown in Fig. 9, a major axis length Rmajor of the mid-point expected area is "link distance×0.75", and a minor axis length Rminor is "link distance×0.35".

　　Then, the grouping unit 22 determines whether the detected mid-point M_{i_j} is present in the mid-point expected area corresponding to the temporary link of the detected body region points N_i and N_j. Then, the grouping unit 22 defines the temporary link corresponding to the mid-point expected area where the detected mid-point M_{i_j} is present as a "candidate link". The grouping unit 22 counts the number of candidate links for each basic pattern. Fig. 8 shows the detected mid-points present in the mid-point expected area. That is, in the example of Fig. 8, the count number of the basic pattern 1 is "1", the count number of the basic pattern 2 is "3", and the count number of the basic pattern 3 is "2".

　　In this case, the grouping unit 22 may, for example, group the grouping target detected body region point N_i into a person group corresponding to the basic pattern 2 having the largest count number.

　　Fig. 10 is another diagram for describing the grouping process. In the example of Fig. 10, the count numbers of the

basic patterns

2 and 3 are both "2", and the

basic patterns

2 and 3 have the largest count number. In this case, the grouping unit 22 may group the grouping target detected body region point N_i into a person group corresponding to a basic pattern (i.e., the basic pattern 2) including a detected base body region point N_0-bp2 corresponding to the main type having the smallest distance from the grouping target detected body region point N_i.

Other Embodiments
Fig. 11 is a diagram showing an example of a hardware configuration of the pose identifying apparatus. In Fig. 11, the pose identifying apparatus includes a processor 101 and a memory 102. The processor 101 may be, for example, a microprocessor, a Micro Processing Unit (MPU), or a Central Processing Unit (CPU). The processor 101 may include a plurality of processors. The memory 102 is composed of a combination of a volatile memory and a non-volatile memory. The memory 102 may include a storage located separated from the processor 101. In this case, the processor 101 may access the memory 102 via an I/O interface (not shown).

　　Each of the pose identifying apparatuses 10 according to the first example embodiment and the pose identifying apparatuses 20 according to the second example embodiment can include the hardware configuration shown in Fig. 11. The basic

pattern extracting units

11 and 21 and the

grouping units

12 and 22 of the

pose identifying apparatuses

10 and 20 according to the first and second example embodiments may be achieved by the processor 101 reading a program stored in the memory 102 and executing it. The program can be stored and provided to the

pose identifying apparatuses

10 and 20 using any type of non-transitory computer readable media. Examples of non-transitory computer readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), and optical magnetic storage media (e.g. magneto-optical disks). Examples of non-transitory computer readable media further include CD-ROM (Read Only Memory), CD-R, and CD-R/W. Examples of non-transitory computer readable media further include semiconductor memories. The semiconductor memories include, for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory), etc. The program may be provided to the

pose identifying apparatuses

10 and 20 using any type of transitory computer readable media. Non-transitory computer readable media include any type of tangible storage media. Transitory computer readable media can provide the program to the

pose identifying apparatuses

10 and 20 via a wired communication line (e.g. electric wires, and optical fibers) or a wireless communication line.

Although the present disclosure has been described with reference to the example embodiments so far, the present disclosure is not limited by the above. Various modifications that can be understood by a person skilled in the art within the scope of the present disclosure can be made to the configuration and details of the present disclosure.

　　The whole or part of the example embodiments disclosed above can be described as, but not limited to, the following supplementary notes.

(Supplementary note 1)
A pose identifying apparatus comprising:
　　basic pattern extracting means for extracting a basic pattern for each human from a plurality of detected body region points and a plurality of detected mid-points, which are detected, in an image including a plurality of person images respectively corresponding to a plurality of humans, for a plurality of predetermined detection target points for a human, wherein the predetermined detection target points include a plurality of body region points of the human and a mid-point of each body region point pair composed of two body region points, and wherein the basic pattern includes a plurality of detected base body region points corresponding to a plurality of base body region types that are different from each other; and
　　grouping means for counting, for each extracted basic pattern, the number of links in which a corresponding detected mid-point is present in a mid-point expected area obtained from links between a plurality of grouping evaluation reference body region points and a grouping target detected body region point, wherein the grouping evaluation reference body region points are composed of some or all of a plurality of detected base body region points included in the extracted basic pattern, and then grouping the grouping target detected body region point into one of a plurality of person groups respectively corresponding to a plurality of the extracted basic patterns based on count values counted for the extracted basic patterns.

(Supplementary note 2)
　　The pose identifying apparatus according to Supplementary note 1, wherein
　　the grouping means groups the grouping target detected body region point into a person group having the largest count value from among the plurality of extracted basic patterns.

(Supplementary note 3)
　　The pose identifying apparatus according to Supplementary note 1, wherein
　　the grouping means groups the grouping target detected body region point into a person group corresponding a basic pattern whose count value is largest from among the plurality of extracted basic patterns and is equal to or greater than a predetermined value.

(Supplementary note 4)
　　The pose identifying apparatus according to

Supplementary note

2 or 3, wherein
　　when there are a plurality of the basic patterns having the largest count value, the grouping means groups the grouping target detected body region point into a person group corresponding to a basic pattern whose distance from the grouping target detected body region point in the image is shortest from among the basic patterns having the largest count value.

(Supplementary note 5)
　　The pose identifying apparatus according to any one of Supplementary notes 1 to 4, wherein
　　the mid-point expected area is a predetermined middle area including a defined mid-point defined as a center point of the link.

(Supplementary note 6)
　　The pose identifying apparatus according to any one of Supplementary notes 1 to 5, wherein
the plurality of base body region types include a main type and a plurality of sub types,
the basic pattern extracting means comprises:
　　basic pattern candidate identifying means for identifying a plurality of basic pattern candidates by classifying, into the same basic pattern candidate, combination which includes detected body region points that are close in distance to each other in the image from among a plurality of combinations of the plurality of detected body region points corresponding to the main type and the plurality of detected body region points corresponding to the respective sub types; and
　　basic pattern formation means for forming the plurality of basic patterns for the plurality of humans by performing optimization processing on the identified plurality of basic pattern candidates.

(Supplementary note 7)
　　The pose identifying apparatus according to Supplementary note 6, wherein the optimization processing comprises:
　　dividing one basic pattern candidate including the plurality of detected body region points corresponding to the main type and converting the one basic pattern candidate into the plurality of basic pattern candidates each including one detected body region point corresponding to the main type;
　　excluding, from each basic pattern candidate, the detected body region point that is included in the basic pattern candidate, that corresponds to the sub type, and whose distance from the detected body region point corresponding to the main type is longer than a base length for the basic pattern candidate; and
　　excluding the basic pattern candidate not including any of a combination of three detected body region points which belong to a first body region type group and a combination of three detected body region points which belong to a second body region type group.

(Supplementary note 8)
　　The pose identifying apparatus according to Supplementary note 7, wherein
　　the main type is a neck,
　　the sub types are a left shoulder, a right shoulder, a left ear, and a right ear,
　　the first body region type group includes the neck, the left shoulder, and the left ear, and
　　the second body region type group includes the neck, the right shoulder, and the right ear.

(Supplementary note 9)
　　The pose identifying apparatus according to Supplementary note 1, wherein the basic pattern includes at least one of a combination of three detected base body region points corresponding to a neck, a left shoulder, and a left ear and a combination of three base detected body region points corresponding to the neck, a right shoulder, and a right ear.

(Supplementary note 10)
　　A pose identifying method comprising:
　　extracting a basic pattern for each human from a plurality of detected body region points and a plurality of detected mid-points, which are detected, in an image including a plurality of person images respectively corresponding to a plurality of humans, for a plurality of predetermined detection target points for a human, wherein the predetermined detection target points include a plurality of body region points of the human and a mid-point of each body region point pair composed of two body region points, and wherein the basic pattern includes a plurality of detected base body region points corresponding to a plurality of base body region types that are different from each other; and
　　counting, for each extracted basic pattern, the number of links in which a corresponding detected mid-point is present in a mid-point expected area obtained from links between a plurality of grouping evaluation reference body region points and a grouping target detected body region point, wherein the grouping evaluation reference body region points are composed of some or all of a plurality of detected base body region points included in the extracted basic pattern, and then grouping the grouping target detected body region point into one of a plurality of person groups respectively corresponding to a plurality of the extracted basic patterns based on count values for the extracted basic patterns.

(Supplementary note 11)
　　A non-transitory computer readable medium storing a program for causing a pose identifying apparatus to execute:
　　extracting a basic pattern for each human from a plurality of detected body region points and a plurality of detected mid-points, which are detected, in an image including a plurality of person images respectively corresponding to a plurality of humans, for a plurality of predetermined detection target points for a human, wherein the predetermined detection target points include a plurality of body region points of the human and a mid-point of each body region point pair composed of two body region points, and wherein the basic pattern includes a plurality of detected base body region points corresponding to a plurality of base body region types that are different from each other; and
　　counting, for each extracted basic pattern, the number of links in which a corresponding detected mid-point is present in a mid-point expected area obtained from links between a plurality of grouping evaluation reference body region points and a grouping target detected body region point, wherein the grouping evaluation reference body region points are composed of some or all of a plurality of detected base body region points included in the extracted basic pattern, and then grouping the grouping target detected body region point into one of a plurality of person groups respectively corresponding to a plurality of the extracted basic patterns based on a count values for the extracted basic patterns.

10　　POSE IDENTIFYING APPARATUS
11　　BASIC PATTERN EXTRACTING UNIT
12　　GROUPING UNIT
20　　POSE IDENTIFYING APPARATUS
21　　BASIC PATTERN EXTRACTING UNIT
21A　　BASIC PATTERN CANDIDATE IDENTIFYING UNIT
21B　　BASE LENGTH CALCULATING UNIT
21C　　BASIC PATTERN FORMING UNIT
22　　GROUPING UNIT

Claims

　　A pose identifying apparatus comprising:
　　basic pattern extracting means for extracting a basic pattern for each human from a plurality of detected body region points and a plurality of detected mid-points, which are detected, in an image including a plurality of person images respectively corresponding to a plurality of humans, for a plurality of predetermined detection target points for a human, wherein the predetermined detection target points include a plurality of body region points of the human and a mid-point of each body region point pair composed of two body region points, and wherein the basic pattern includes a plurality of detected base body region points corresponding to a plurality of base body region types that are different from each other; and
　　grouping means for counting, for each extracted basic pattern, the number of links in which a corresponding detected mid-point is present in a mid-point expected area obtained from links between a plurality of grouping evaluation reference body region points and a grouping target detected body region point, wherein the grouping evaluation reference body region points are composed of some or all of a plurality of detected base body region points included in the extracted basic pattern, and then grouping the grouping target detected body region point into one of a plurality of person groups respectively corresponding to a plurality of the extracted basic patterns based on count values counted for the extracted basic patterns.
　　The pose identifying apparatus according to Claim 1, wherein
　　the grouping means groups the grouping target detected body region point into a person group corresponding a basic pattern having the largest count value from among the plurality of extracted basic patterns.
　　The pose identifying means according to Claim 1, wherein
　　the grouping means groups the grouping target detected body region point into a person group corresponding a basic pattern whose count value is largest from among the plurality of extracted basic patterns and is equal to or greater than a predetermined value.
　　The pose identifying apparatus according to Claim 2 or 3, wherein
　　when there are a plurality of the basic patterns having the largest count value, the grouping means groups the grouping target detected body region point into a person group corresponding to a basic pattern whose distance from the grouping target detected body region point in the image is shortest from among the basic patterns having the largest count value.
The pose identifying apparatus according to any one of Claims 1 to 4, wherein
　　the mid-point expected area is a predetermined middle area including a defined mid-point defined as a center point of the link.
　　The pose identifying apparatus according to any one of Claims 1 to 5, wherein
the plurality of base body region types include a main type and a plurality of sub types,
the basic pattern extracting means comprises:
　　basic pattern candidate identifying means for identifying a plurality of basic pattern candidates by classifying, into the same basic pattern candidate, combination which includes detected body region points that are close in distance to each other in the image from among a plurality of combinations of the plurality of detected body region points corresponding to the main type and the plurality of detected body region points corresponding to the respective sub types; and
　　basic pattern formation means for forming the plurality of basic patterns for the plurality of humans by performing optimization processing on the identified plurality of basic pattern candidates.
　　The pose identifying apparatus according to Claim 6, wherein the optimization processing comprises:
　　dividing one basic pattern candidate including the plurality of detected body region points corresponding to the main type and converting the one basic pattern candidate into the plurality of basic pattern candidates each including one detected body region point corresponding to the main type;
　　excluding, from each basic pattern candidate, the detected body region point that is included in the basic pattern candidate, that corresponds to the sub type, and whose distance from the detected body region point corresponding to the main type is longer than a base length for the basic pattern candidate; and
　　excluding the basic pattern candidate not including any of a combination of three detected body region points which belong to a first body region type group and a combination of three detected body region points which belong to a second body region type group.
　　The pose identifying apparatus according to Claim 7, wherein
　　the main type is a neck,
　　the sub types are a left shoulder, a right shoulder, a left ear, and a right ear,
　　the first body region type group includes the neck, the left shoulder, and the left ear, and
　　the second body region type group includes the neck, the right shoulder, and the right ear.
　　The pose identifying apparatus according to Claim 1, wherein the basic pattern includes at least one of a combination of three detected base body region points corresponding to a neck, a left shoulder, and a left ear and a combination of three base detected body region points corresponding to the neck, a right shoulder, and a right ear.
　　A pose identifying method comprising:
　　extracting a basic pattern for each human from a plurality of detected body region points and a plurality of detected mid-points, which are detected, in an image including a plurality of person images respectively corresponding to a plurality of humans, for a plurality of predetermined detection target points for a human, wherein the predetermined detection target points include a plurality of body region points of the human and a mid-point of each body region point pair composed of two body region points, and wherein the basic pattern includes a plurality of detected base body region points corresponding to a plurality of base body region types that are different from each other; and
　　counting, for each extracted basic pattern, the number of links in which a corresponding detected mid-point is present in a mid-point expected area obtained from links between a plurality of grouping evaluation reference body region points and a grouping target detected body region point, wherein the grouping evaluation reference body region points are composed of some or all of a plurality of detected base body region points included in the extracted basic pattern, and then grouping the grouping target detected body region point into one of a plurality of person groups respectively corresponding to a plurality of the extracted basic patterns based on count values for the extracted basic patterns.
　　A non-transitory computer readable medium storing a program for causing a pose identifying apparatus to execute:
　　extracting a basic pattern for each human from a plurality of detected body region points and a plurality of detected mid-points, which are detected, in an image including a plurality of person images respectively corresponding to a plurality of humans, for a plurality of predetermined detection target points for a human, wherein the predetermined detection target points include a plurality of body region points of the human and a mid-point of each body region point pair composed of two body region points, and wherein the basic pattern includes a plurality of detected base body region points corresponding to a plurality of base body region types that are different from each other; and
　　counting, for each extracted basic pattern, the number of links in which a corresponding detected mid-point is present in a mid-point expected area obtained from links between a plurality of grouping evaluation reference body region points and a grouping target detected body region point, wherein the grouping evaluation reference body region points are composed of some or all of a plurality of detected base body region points included in the extracted basic pattern, and then grouping the grouping target detected body region point into one of a plurality of person groups respectively corresponding to a plurality of the extracted basic patterns based on a count values for the extracted basic patterns.