WO2023084780A1 - 画像処理装置、画像処理方法、およびプログラム - Google Patents
画像処理装置、画像処理方法、およびプログラム Download PDFInfo
- Publication number
- WO2023084780A1 WO2023084780A1 PCT/JP2021/041928 JP2021041928W WO2023084780A1 WO 2023084780 A1 WO2023084780 A1 WO 2023084780A1 JP 2021041928 W JP2021041928 W JP 2021041928W WO 2023084780 A1 WO2023084780 A1 WO 2023084780A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature amount
- detected
- image
- human body
- human bodies
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20036—Morphological image processing
- G06T2207/20044—Skeletonization; Medial axis transform
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Definitions
- the present invention relates to an image processing device, an image processing method, and a program.
- Patent Document 1 discloses Technologies related to the present invention.
- Japanese Patent Laid-Open No. 2002-200000 describes a method of calculating a feature amount for each of a plurality of key points of a human body included in an image, and retrieving an image containing a human body with a similar posture or a similar movement based on the calculated feature amount.
- Techniques for grouping and classifying objects having similar postures and movements are disclosed.
- Non-Patent Document 1 discloses a technique related to human skeleton estimation.
- An object of the present invention is to improve the accuracy of a technique for retrieving images containing human bodies with similar postures and movements, and classifying images containing human bodies with similar postures and movements together. .
- skeletal structure detection means for performing a process of detecting a plurality of key points corresponding to each of a plurality of parts of the human body included in the image; feature quantity calculation means for calculating a feature quantity of each of the detected key points; input means for receiving user input designating a method of integrating the feature amounts of the key points detected from each of a plurality of human bodies for each part; A processing means for calculating an integrated feature amount for each part by performing integration for each part by the method specified by the user input, and performing image search or image classification based on the integrated feature amount; is provided.
- the computer a skeletal structure detection step of performing a process of detecting a plurality of key points corresponding to each of a plurality of parts of the human body included in the image;
- An image processing method is provided for performing
- the computer skeletal structure detection means for detecting a plurality of key points corresponding to each of a plurality of parts of the human body included in the image; feature quantity calculation means for calculating a feature quantity of each of the detected key points; input means for accepting user input specifying a method for integrating the feature amounts of the key points detected from each of a plurality of human bodies for each part; Processing means for calculating an integrated feature amount for each part by performing integration for each part by the method specified by the user input, and performing image search or image classification based on the integrated feature amount;
- a program is provided to act as a
- the present invention it is possible to improve the accuracy of a technique for retrieving images containing human bodies with similar postures and movements, and classifying images containing human bodies with similar postures and movements together. .
- FIG. 10 is a diagram showing an example of processing for identifying the correspondence between frame images according to the embodiment; It is a figure which shows an example of the process which calculates an integrated feature-value from the moving image of this embodiment.
- 4 is a flow chart showing an example of the flow of processing of the image processing apparatus of the present embodiment; 4 is a flow chart showing an example of the flow of processing of the image processing apparatus of the present embodiment; It is a figure for demonstrating an example of the process which calculates an integrated feature-value from a still image of this embodiment. It is a figure for demonstrating an example of the process which calculates an integrated feature-value from a still image of this embodiment. It is a figure for demonstrating an example of the process which calculates an integrated feature-value from a still image of this embodiment. It is a figure for demonstrating an example of the process which calculates an integrated feature-value from a still image of this embodiment.
- FIG. 4 is a diagram schematically showing an example of information displayed by the image processing apparatus according to the embodiment;
- FIG. 4 is a diagram schematically showing an example of information displayed by the image processing apparatus according to the embodiment;
- FIG. 4 is a flow chart showing an example of the flow of processing of the image processing apparatus of the present embodiment;
- It is a figure which shows an example of the functional block diagram of the image processing apparatus of this embodiment.
- It is a figure which shows an example of the functional block diagram of the image processing apparatus of this embodiment.
- 4 is a diagram schematically showing an example of information displayed by the image processing apparatus according to the embodiment;
- the image processing apparatus of this embodiment detects key points corresponding to each part of the human body (hereinafter, "part of the human body” may be simply referred to as “part”) from each of a plurality of human bodies, are integrated for each part to calculate an integrated feature amount for each part. Then, the image processing device performs image search and image classification based on the calculated integrated feature amount for each part. According to such an image processing apparatus, when a certain keypoint is not detected from one human body, it can be complemented with the feature amount of the keypoint detected from another human body. Therefore, it is possible to calculate an integrated feature amount corresponding to each of all parts.
- the illustrated first still image is an image of a person washing his hands photographed from the left side of the person.
- the right part of the person's body is hidden and not visible.
- processing for detecting N keypoints of the human body is performed on such a first still image, some of the N keypoints, that is, keypoints included in non-hidden portions are detected.
- the keypoints included in the other part of the N keypoints, ie, the hidden part are not detected. As a result, some keypoint features are missing.
- the second still image is an image of a person washing his hands taken from the right side of the person.
- the left part of the person's body is hidden and not visible.
- processing for detecting N keypoints of the human body is performed on such a second still image, some of the N keypoints, that is, keypoints included in non-hidden portions are detected.
- the keypoints included in the other part of the N keypoints, ie, the hidden part are not detected. As a result, some keypoint features are missing.
- the image processing apparatus of the present embodiment converts the feature amount of the keypoints detected from the human body included in the first still image and the feature amount of the keypoints detected from the human body included in the second still image.
- the feature amount of the keypoints not detected from the human body included in the first still image can be complemented with the feature amount of the keypoints detected from the human body included in the second still image.
- the feature amount of keypoints not detected from the human body included in the second still image can be complemented with the feature amount of keypoints detected from the human body included in the first still image.
- integrated feature amounts corresponding to all of the N parts can be calculated. Then, using the integrated features corresponding to all of the N parts, images containing human bodies with similar postures and movements are searched, and images containing human bodies with similar postures and movements are grouped and classified. This will improve its accuracy.
- Each functional unit of the image processing apparatus includes a CPU (Central Processing Unit) of any computer, a memory, a program loaded into the memory, and a storage unit such as a hard disk for storing the program (previously stored from the stage of shipping the apparatus). It can also store programs downloaded from storage media such as CDs (Compact Discs) and servers on the Internet, etc.), and is realized by any combination of hardware and software centered on the interface for network connection. be. It should be understood by those skilled in the art that there are various modifications to the implementation method and apparatus.
- FIG. 2 is a block diagram illustrating the hardware configuration of the image processing device.
- the image processing apparatus has a processor 1A, a memory 2A, an input/output interface 3A, a peripheral circuit 4A and a bus 5A.
- the peripheral circuit 4A includes various modules.
- the image processing device does not have to have the peripheral circuit 4A.
- the image processing device may be composed of a plurality of physically and/or logically separated devices. In this case, each of the plurality of devices can have the above hardware configuration.
- the bus 5A is a data transmission path for mutually transmitting and receiving data between the processor 1A, the memory 2A, the peripheral circuit 4A and the input/output interface 3A.
- the processor 1A is, for example, an arithmetic processing device such as a CPU or a GPU (Graphics Processing Unit).
- the memory 2A is, for example, RAM (Random Access Memory) or ROM (Read Only Memory).
- the input/output interface 3A includes an interface for acquiring information from an input device, an external device, an external server, an external sensor, a camera, etc., an interface for outputting information to an output device, an external device, an external server, etc. .
- Input devices are, for example, keyboards, mice, microphones, physical buttons, touch panels, and the like.
- the output device is, for example, a display, speaker, printer, mailer, or the like.
- the processor 1A can issue commands to each module and perform calculations based on the calculation results thereof.
- FIG. 3 shows an example of a functional block diagram of the image processing apparatus 100 of this embodiment.
- the illustrated image processing apparatus 100 includes a skeleton structure detection unit 101 , a feature amount calculation unit 102 , a processing unit 103 and a storage unit 104 .
- the image processing apparatus 100 may not have the storage unit 104 .
- the external device has the storage unit 104 .
- the storage unit 104 is configured to be accessible from the image processing apparatus 100 .
- the skeletal structure detection unit 101 performs processing to detect N (N is an integer equal to or greater than 2) keypoints corresponding to each of a plurality of parts of the human body included in the image.
- An image is a concept that includes still images and moving images.
- the skeletal structure detection unit 101 performs processing to detect keypoints for each frame image.
- the processing by the skeletal structure detection unit 101 is realized using the technology disclosed in Japanese Patent Application Laid-Open No. 2002-200013. Although the details are omitted, the technique disclosed in Patent Document 1 detects the skeleton structure using the skeleton estimation technique such as OpenPose disclosed in Non-Patent Document 1.
- the skeletal structure detected by this technique consists of "keypoints", which are characteristic points such as joints, and "bones (bone links)", which indicate links between keypoints.
- FIG. 4 shows the skeletal structure of the human body model 300 detected by the skeletal structure detection unit 101
- FIGS. 5 and 6 show detection examples of the skeletal structure.
- a skeleton structure detection unit 101 detects the skeleton structure of a human body model (two-dimensional skeleton model) 300 as shown in FIG. 4 from a two-dimensional image using a skeleton estimation technique such as OpenPose.
- the human body model 300 is a two-dimensional model composed of key points such as human joints and bones connecting the key points.
- the skeletal structure detection unit 101 extracts feature points that can be keypoints from an image, refers to information obtained by machine learning the image of the keypoints, and detects N keypoints of the human body.
- the N keypoints to detect are predetermined.
- the number of keypoints to be detected that is, the number of N
- which parts of the human body are to be detected as keypoints are various, and all variations can be adopted.
- head A1, neck A2, right shoulder A31, left shoulder A32, right elbow A41, left elbow A42, right hand A51, left hand A52, right hip A61, left hip A62, right knee A71, left Assume that the knee A72, the right foot A81, and the left foot A82 are defined as N keypoints (N 14) to be detected.
- the human bones connecting these key points are bone B1 connecting head A1 and neck A2, bone B21 and bone B22 connecting neck A2 and right shoulder A31 and left shoulder A32, respectively.
- FIG. 5 is an example of detecting key points from an upright human body.
- an upright human body is imaged from the front and all 14 keypoints are detected.
- FIG. 6 shows an example of detecting key points from a squatting human body.
- a squatting human body is imaged from the right side, and only some of the 14 keypoints are detected.
- the head A1, the neck A2, the right shoulder A31, the right elbow A41, the right hand A51, the right hip A61, the right knee A71, and the right foot A81 are detected, and the left shoulder A32, the left elbow A42, and the left hand are detected.
- A52, left hip A62, left knee A72 and left foot A82 are not detected.
- the feature quantity calculation unit 102 calculates the feature quantity of the detected two-dimensional skeletal structure. For example, the feature quantity calculator 102 calculates a feature quantity for each detected keypoint.
- the feature value of the skeletal structure indicates the characteristics of the skeleton of a person, and is an element for classifying and searching the state (posture and movement) of a person based on the skeleton of the person.
- this feature quantity includes multiple parameters.
- the feature amount may be the feature amount of the entire skeleton structure, the feature amount of a part of the skeleton structure, or may include a plurality of feature amounts like each part of the skeleton structure. Any method such as machine learning or normalization may be used as the method for calculating the feature amount, and the minimum value or the maximum value may be obtained as the normalization.
- the feature amount is the feature amount obtained by machine learning the skeletal structure, the size of the skeletal structure on the image from the head to the foot, and the vertical direction of the skeletal region including the skeletal structure on the image. and the relative positional relationship of a plurality of keypoints in the lateral direction of the skeletal region.
- the size of the skeletal structure is the vertical height, area, etc. of the skeletal region containing the skeletal structure on the image.
- the vertical direction (height direction or vertical direction) is the vertical direction (Y-axis direction) in the image, for example, the direction perpendicular to the ground (reference plane).
- the left-right direction (horizontal direction) is the left-right direction (X-axis direction) in the image, for example, the direction parallel to the ground.
- features that are robust to classification and search processing it is preferable to use features that are robust to classification and search processing.
- a feature quantity that is robust to the person's orientation or body shape may be used.
- FIG. 7 shows an example of feature amounts for each of a plurality of key points obtained by the feature amount calculation unit 102.
- FIG. 7 shows an example of feature amounts for each of a plurality of key points obtained by the feature amount calculation unit 102.
- the feature amount of the keypoints exemplified here is merely an example, and the present invention is not limited to this.
- the keypoint feature quantity indicates the relative positional relationship of multiple keypoints in the vertical direction of the skeletal region containing the skeletal structure on the image. Since the key point A2 of the neck is used as the reference point, the feature amount of the key point A2 is 0.0, and the feature amount of the key point A31 of the right shoulder and the key point A32 of the left shoulder, which are at the same height as the neck, are also 0.0. be.
- the feature value of the keypoint A1 of the head higher than the neck is -0.2.
- the right hand keypoint A51 and left hand keypoint A52 lower than the neck have a feature quantity of 0.4, and the right foot keypoint A81 and left foot keypoint A82 have a feature quantity of 0.9.
- the feature amount (normalized value) of the example indicates the feature in the height direction (Y direction) of the skeletal structure (key point), and is affected by the change in the lateral direction (X direction) of the skeletal structure. do not have.
- the processing unit 103 integrates feature amounts of key points detected from each of M (M is an integer equal to or greater than 2) human bodies for each part, and calculates an integrated feature amount for each part. Then, the processing unit 103 performs image search or image classification based on the integrated feature amount for each part.
- the plurality of key points correspond to each of the plurality of parts. For this reason, performing processing "for each part" is the same as performing processing "for each key point". For example, "an integrated feature amount for each part” obtained by calculating for each part has the same meaning as "an integrated feature amount for each of N key points" obtained by calculating for each key point.
- the user designates M human bodies to be processed for calculating integrated feature amounts.
- the user may designate M human bodies by designating M still images each including one human body (designating M still image files). Designation of M still images is performed, for example, by an operation of inputting M still images to the image processing apparatus 100, or by selecting M still images from a plurality of still images stored in the image processing apparatus 100. operations, etc.
- the skeleton structure detection unit 101 described above performs processing for detecting N keypoints for each of the designated M still images. Note that all N keypoints may be detected, or only some of the N keypoints may be detected.
- a feature amount calculation unit 102 calculates a feature amount for each of the detected keypoints.
- the user designates at least one still image (designates at least one still image file), and designates M regions each including one human body in the at least one designated still image. , M human bodies.
- a plurality of regions that is, a plurality of human bodies
- the process of designating a partial area in a still image can be realized using any conventional technique.
- the skeletal structure detection unit 101 described above performs processing for detecting N keypoints for each of the designated M regions. Note that all N keypoints may be detected, or only some of the N keypoints may be detected.
- a feature amount calculation unit 102 calculates a feature amount for each of the detected keypoints.
- the processing unit 103 integrates them for each keypoint to calculate an integrated feature amount. For example, the processing unit 103 sequentially selects one of the N keypoints and performs a process of calculating an integrated feature amount.
- one of the N keypoints, which is selected as a processing target, is referred to as a "first keypoint".
- an integrated feature amount of the first keypoint (synonymous with "integrated feature amount of the first part") is calculated based on the feature amount of the first keypoint detected from the other portion. This process makes it possible to integrate the keypoint feature amounts calculated from each of a plurality of human bodies by complementing each other's missing portions.
- the detection state of the first keypoint is (1) detected from only one of M human bodies, (2) detected from a plurality of M human bodies, and (3) detected from M human bodies. It is either not detected from any of the
- the processing unit 103 can calculate an integrated feature amount through processing according to each detection state. A detailed description will be given below.
- the processing unit 103 Detection from only one of M human bodies
- the processing unit 103 The feature amount of the first keypoint obtained is the integrated feature amount of the first keypoint.
- the processing unit 103 performs any one of calculation examples 1 to 4 below. Then, the integrated feature amount of the first keypoint is calculated.
- the processing unit 103 converts the statistical values of the feature amounts of the first keypoints detected from the plurality of human bodies into the first key points. It is calculated as an integrated feature amount of points.
- a statistic is the mean, median, mode, maximum, or minimum.
- the processing unit 103 selects the feature with the highest degree of certainty among the feature amounts of the first keypoints detected from the plurality of human bodies.
- the quantity be the integrated feature quantity of the first keypoint.
- a score output in association with each detected keypoint may be used as the certainty of each keypoint.
- the processing unit 103 calculates a first A weighted average value of the feature amounts of the keypoints is calculated as the integrated feature amount of the first keypoint.
- a score output in association with each detected keypoint may be used as the certainty of each keypoint.
- the user designates the priority of each of the designated M human bodies.
- the designated content is input to the image processing apparatus 100 .
- the processing unit 103 detects the first keypoint from the human body with the highest priority among the plurality of human bodies from which the first keypoint is detected.
- the feature amount of the detected first keypoint is set as the integrated feature amount of the first keypoint.
- the processing unit 103 detects the integrated feature of the first keypoint Do not calculate quantity.
- the user designates M human bodies to be processed for calculating integrated feature amounts.
- the user may designate M human bodies by designating M moving pictures (M moving picture file designations) each including one human body.
- the designation of M moving images is, for example, an operation of inputting M moving images to the image processing apparatus 100, an operation of selecting M moving images from a plurality of moving images stored in the image processing apparatus 100, or the like.
- the skeletal structure detection unit 101 described above performs a process of detecting N keypoints for frame images of each of the designated M moving images. Note that all N keypoints may be detected, or only some of the N keypoints may be detected.
- a feature amount calculation unit 102 calculates a feature amount for each of the detected keypoints.
- the user designates at least one moving image (designating at least one moving image file), and M scenes each including one human body in the at least one designated moving image M human bodies may be specified by specifying a scene, a scene composed of some frame images among a plurality of frame images included in a moving image) or M areas.
- a plurality of scenes or a plurality of areas may be designated from one moving image.
- the process of designating a partial scene or partial area in a moving image can be realized using any conventional technology.
- the skeletal structure detection unit 101 described above detects N keypoints for frame images of each of the designated M scenes (or partial regions designated by the user in the frame images). process. Note that all N keypoints may be detected, or only some of the N keypoints may be detected.
- a feature amount calculation unit 102 calculates a feature amount for each of the detected keypoints.
- the processing unit 103 After the feature amount of each of the M keypoints of the human body specified by the user is calculated, the processing unit 103 integrates them for each keypoint to calculate an integrated feature amount.
- the processing unit 103 identifies correspondence relationships between frame images in M moving images and M scenes, and integrates keypoint feature amounts detected from each of a plurality of corresponding frame images for each keypoint. A more detailed description will be given below with reference to FIGS. 10 to 12.
- the processing unit 103 associates the frame images in which the human body performing the predetermined movement in the first moving image and the human body performing the predetermined movement in the second moving image have the same posture. .
- corresponding frame images are connected by lines.
- one frame image of the first moving image may be associated with a plurality of frame images of the second moving image.
- one frame image of the second moving image may be associated with a plurality of frame images of the first moving image.
- the identification of the correspondence relationship can be realized using, for example, a technique such as DTW (Dynamic Time Warping).
- DTW Dynamic Time Warping
- the distance between features Manhattan distance or Euclidean distance
- the correspondence relationship can be specified. can be done.
- time-series data of integrated feature amounts of N keypoints can be obtained.
- F 11 +F 21 in FIG. 12 represents the feature quantity of the keypoint of the human body detected from the frame image F 11 of the first moving image in FIG. 10 and the key point of the human body detected from the frame image F 21 of the second moving image It is an integrated feature amount of N keypoints obtained by integrating the point feature amount.
- the means for integrating the feature amounts of the keypoints of the human body detected from the corresponding frame images is the same as the above-described means for integrating the feature amounts of the keypoints of the human body detected from the still image.
- the processing unit 103 uses as a query the integrated feature amount calculated based on the M human bodies specified by the user as described above, and searches still images including human bodies having postures similar to those indicated by the integrated feature amount. Images, videos, etc. that include a human body whose movements are similar to those indicated by the time-series data of integrated feature values are searched.
- the method of searching can be realized using the technology disclosed in Patent Document 1.
- the processing unit 103 treats the posture and movement indicated by the integrated feature amount calculated based on the M human bodies specified by the user as one target of the classification processing, and classifies the posture and movement. Group similar items together.
- the method of classification can be realized using the technology disclosed in Patent Document 1.
- the processing unit 103 may register postures and movements indicated by integrated feature amounts calculated based on the M human bodies specified by the user as described above in the database (storage unit 104) as one processing target.
- a plurality of postures and motions registered in the database may be objects to be collated with queries in the image search processing, or may be classification processing objects in the image classification processing. For example, by photographing the same person from a plurality of angles with a plurality of cameras and designating a plurality of human bodies of the same person included in the plurality of images photographed by the plurality of cameras as the M human bodies, the human body An integrated feature quantity that well indicates the posture and movement of the robot is calculated and registered in the database.
- the image processing device 100 acquires at least one image (S10).
- the image processing apparatus 100 performs a process of detecting N keypoints from each of the M human bodies included in at least one acquired image (S11). From each human body, all N keypoints may be detected, or only some of the N keypoints may be detected.
- the image processing apparatus 100 calculates feature amounts of the detected keypoints for each human body (S12).
- the image processing apparatus 100 integrates the feature amounts of the keypoints detected from each of the M human bodies, and calculates an integrated feature amount of each of the N keypoints (S13).
- the image processing apparatus 100 performs image search or image classification based on the integrated feature amount calculated in S13 (S14).
- the image processing device 100 selects one of the N keypoints as a processing target (S20).
- the selected keypoint is hereinafter referred to as the first keypoint.
- the image processing apparatus 100 performs processing according to the number of human bodies from which the first keypoints are detected.
- the image processing apparatus 100 detects the first keypoint detected from that one human body. is output as the integrated feature amount of the first keypoint (S23).
- the image processing apparatus 100 detects the feature amounts of the first keypoints detected from the plurality of human bodies. is output as the integrated feature amount of the first keypoint (S24).
- the details of the arithmetic processing are as described above.
- the processing unit 103 does not calculate the integrated feature amount of the first keypoint, It outputs that there is no feature amount (S22).
- the image processing apparatus 100 of the present embodiment integrates feature amounts of keypoints detected from each of a plurality of human bodies, and calculates an integrated feature amount of each of the plurality of keypoints. Then, the image processing apparatus performs image search and image classification based on the calculated integrated feature amount. According to such an image processing apparatus, it is possible to supplement keypoint feature amounts that have not been detected from a certain human body with keypoint feature amounts that have been detected from another human body. Therefore, integrated feature amounts corresponding to all key points can be calculated. By performing image search and image classification based on integrated feature amounts corresponding to all key points, the accuracy is improved.
- N keypoints of multiple human bodies P as shown in FIGS. 15 and 16 can be integrated.
- the still image in FIG. 15 is an image of a person washing his hands photographed from the left side of the person.
- the left side of the body of the person is visible, but the right side of the body is hidden.
- keypoints included in the left portion of the body of the person are detected, but keypoints included in the right portion are not detected.
- the still image in FIG. 16 is an image of a person washing his hands taken from the right side of the person.
- the right side of the person's body is visible, but the left side of the body is hidden.
- N keypoints of a plurality of human bodies P as shown in FIGS. 17 and 18 can be integrated.
- the still image in FIG. 17 is an image of a person standing with his/her left hand on his/her waist, photographed from the front of the person.
- the still image in FIG. 18 is an image of a person standing with his or her right hand raised, photographed from the front of the person. A part of the left half of the person's body is hidden by the vehicle Q in the second still image.
- the feature of the portion appearing in both the first still image and the second still image is the portion appearing in the second still image.
- the calculated N integrated feature values indicate a standing posture with the left hand on the waist as in the first still image and the right hand raised as in the second still image.
- N keypoints of a plurality of human bodies P as shown in FIGS. 19 and 20 can be integrated.
- the moving image in FIG. 19 is an image of a person who raises his/her right hand in a standing state and is photographed from the front of the person. A part of the left half of the body of the person is hidden by the vehicle Q in the second moving image.
- keypoints included in the non-hidden portion of the person's body are detected, but keypoints included in the hidden portion are not detected.
- the moving image in FIG. 20 is an image of a person who is standing with his or her hands on the waist, and which is photographed from the front of the person. In the second animation, there are no hidden parts of the person's body.
- the missing parts in the first video are supplemented with the second video, and all N keypoints are obtained.
- a corresponding integrated feature amount can be calculated.
- the method of Example 4 described above that is, calculation of the integrated feature amount based on the priority of each of the M human bodies may be performed.
- the user assigns a higher priority to the human body included in the first moving image than the human body included in the second moving image.
- the feature of the portion appearing in both the first moving image and the second moving image is the portion appearing in the first moving image.
- the time-series data of the calculated N integrated feature values can be obtained by putting the left hand on the waist as in the second video and raising the right hand in a standing state as shown in the first video. will be shown.
- M human bodies may be the human bodies of the same person, or may be the human bodies of different people.
- the image processing apparatus 100 of this embodiment differs from that of the first embodiment in the details of processing for integrating key points detected from each of M human bodies and calculating an integrated feature amount.
- the integrated feature amount is calculated according to the flow shown in FIG. 14, for example.
- the image processing apparatus 100 integrates the keypoints detected from each of the M human bodies by a method specified by user input to calculate an integrated feature amount. A detailed description will be given below.
- FIG. 21 shows an example of a functional block diagram of the image processing device 100 of this embodiment.
- the illustrated image processing apparatus 100 has a skeleton structure detection unit 101 , a feature amount calculation unit 102 , a processing unit 103 , a storage unit 104 and an input unit 106 .
- the image processing apparatus 100 may not have the storage unit 104 .
- the external device has the storage unit 104 .
- the storage unit 104 is configured to be accessible from the image processing apparatus 100 .
- the input unit 106 accepts user input specifying a method of integrating key point feature quantities detected from each of the M human bodies.
- the input unit 106 can accept the above user input via any input device such as a touch panel, keyboard, mouse, physical button, microphone, gesture input device, and the like.
- the processing unit 103 integrates the feature amounts detected from each of the M human bodies for each keypoint using a method designated by user input, and calculates integrated feature amounts for each of the N keypoints.
- the input unit 106 and the processing unit 103 can execute either of the following processing examples 1 and 2.
- the input unit 106 performs an input designating a key point for adopting a feature amount for each of M human bodies. This is synonymous with an input specifying, for each keypoint, from which human body the feature amount of the keypoint detected is to be adopted. Then, the processing unit 103 determines the feature amount of the first keypoint detected from the human body specified by the user input as the integrated feature amount of the first keypoint.
- the input unit 106 displays a human body model in which N objects R corresponding to N keypoints are arranged at corresponding skeletal positions of the human body, and calculates the calculated feature values.
- User input for selecting an object corresponding to a keypoint to be adopted or an object corresponding to a keypoint not to be adopted may be received for each of the M human bodies.
- the input unit 106 can input multiple key points such as the head, neck, right shoulder 1, left shoulder, right elbow, left elbow, right hand, left hand, right hip, left hip, right knee, left knee, right leg, left leg, etc.
- the names of the corresponding parts of the body are displayed, and a user input for selecting a key point to adopt or not to adopt the calculated feature value from among the names is accepted for each of the M human bodies. good.
- UI user interface
- components such as check boxes may be used.
- the input unit 106 displays a human body model in which N objects R corresponding to N key points are arranged at corresponding skeletal positions of the human body. User input may be received to select at least a portion. Then, the input unit 106 may determine a keypoint present in the body part selected by the user input as a keypoint that adopts the calculated feature amount or a keypoint that does not adopt the calculated feature amount. .
- a frame W is used to select at least a portion of the body. The user adjusts the position and size of the frame W so that the desired key points are included in the frame W.
- the input unit 106 may display the names of body parts such as the upper body, lower body, right body, and left body, and accept user input to select at least one of them. Then, the input unit 106 may determine a keypoint present in the body part selected by the user input as a keypoint that adopts the calculated feature amount or a keypoint that does not adopt the calculated feature amount. .
- UI user interface
- the input unit 106 receives user input designating, for each keypoint, the weight of the feature amount calculated from each of the M human bodies for each of the M human bodies. Then, the processing unit 103 calculates a weighted average value corresponding to the weight specified by the user of the feature amounts calculated from each of the M human bodies as the integrated feature amount of each key point.
- the input unit 106 may receive an input specifying the weight of the specified keypoint after receiving the input specifying the keypoint individually by the method described in the first processing example.
- the input unit 106 accepts an input specifying a body part by the method described in the processing example 1, and then receives an input specifying a weight common to all key points included in the specified body part. You may accept more.
- the image processing device 100 acquires at least one image (S30).
- the image processing apparatus 100 receives a user input designating a method of integrating keypoint feature amounts detected from each of M (M is an integer equal to or greater than 2) human bodies (S31).
- the image processing apparatus 100 performs a process of detecting N keypoints from each of the M human bodies included in at least one acquired image (S32). From each human body, all N keypoints may be detected, or only some of the N keypoints may be detected.
- the image processing apparatus 100 calculates feature amounts of the detected keypoints for each human body (S33).
- the image processing apparatus 100 integrates the feature amounts of the keypoints detected from each of the M human bodies by the method specified in S31, and calculates integrated feature amounts of each of the N keypoints (S34 ).
- the image processing apparatus 100 performs image search or image classification based on the integrated feature amount calculated in S34 (S35).
- the same effects as those of the first embodiment are realized.
- the user since the user can specify the method of integration, it is possible to calculate the integrated feature amount desired by the user.
- the image processing apparatus 100 of the present embodiment has a function of outputting information identifying key points for which integrated feature amounts have been calculated and key points for which integrated feature amounts have not been calculated. A detailed description will be given below.
- FIG. 25 shows an example of a functional block diagram of the image processing device 100 of this embodiment.
- the illustrated image processing apparatus 100 includes a skeleton structure detection unit 101 , a feature amount calculation unit 102 , a processing unit 103 , a storage unit 104 and a display unit 105 .
- FIG. 26 shows another example of a functional block diagram of the image processing device 100 of this embodiment.
- the illustrated image processing apparatus 100 has a skeleton structure detection unit 101 , a feature amount calculation unit 102 , a processing unit 103 , a storage unit 104 , a display unit 105 and an input unit 106 .
- the image processing apparatus 100 does not have to have the storage unit 104 .
- the external device has the storage unit 104 .
- the storage unit 104 is configured to be accessible from the image processing apparatus 100 .
- the display unit 105 detects a key point which is not detected from any of the M human bodies designated by the user and for which the integrated feature amount is not calculated, and a key point which is detected from at least one of the M human bodies and calculates the integrated feature amount. Displays information that identifies the keypoint that was selected.
- the display unit 105 displays a human body model in which N objects R corresponding to N key points are arranged at corresponding skeletal positions of the human body, and the integrated feature amount is calculated.
- An object corresponding to a keypoint that is not detected and an object corresponding to a keypoint detected from at least one of the M human bodies and for which an integrated feature amount has been calculated may be displayed so as to be identifiable.
- the method of displaying the object in an identifiable manner may be implemented by whether or not to paint over the object as shown in FIG. 27, but is not limited to this.
- Other methods include, for example, making the color of the object different, making the object different in shape, and blinking the object corresponding to the key point for which the integrated feature amount is calculated or the key point for which the integrated feature amount is not calculated. Examples include highlighting.
- the display unit 105 may further display information identifying the keypoints detected from each of the M human bodies specified by the user and the keypoints not detected from each of the human bodies. That is, the display unit 105 may further display information for identifying regions where keypoints have been detected and regions where no keypoints have been detected.
- the display can be realized by a method similar to the method described using FIG. 27 .
- the same effects as those of the first and second embodiments are achieved.
- the user can determine which of the N keypoints are covered by the designated M human bodies based on the information displayed by the display unit 105. , can be easily grasped. Further, by using an image such as that shown in FIG. 27, the user can intuitively grasp the above contents. As a result, the user can grasp what kind of human body should be added in order to generate an integrated feature amount of all N keypoints.
- skeletal structure detection means for performing a process of detecting a plurality of key points corresponding to each of a plurality of parts of the human body included in the image; feature quantity calculation means for calculating a feature quantity of each of the detected key points; input means for receiving user input designating a method of integrating the feature amounts of the key points detected from each of a plurality of human bodies for each part; A processing means for calculating an integrated feature amount for each part by performing integration for each part by the method specified by the user input, and performing image search or image classification based on the integrated feature amount; An image processing device having 2.
- the input means Receiving the user input designating from which of the plurality of human bodies the feature amount calculated from the plurality of human bodies is to be adopted for each part;
- the processing means 2 The image processing apparatus according to 1, wherein the feature amount calculated from the human body specified by the user input is determined as the integrated feature amount for each part.
- the input means For each of the plurality of human bodies, a human body model in which a plurality of objects are arranged at the parts of the human body is displayed, and the calculated feature values correspond to the objects corresponding to the parts to be adopted or to the parts not to be adopted. 3.
- the image processing apparatus according to 2 which receives the user input for selecting the object. 4.
- the input means displaying a human body model for each of the plurality of human bodies, and accepting the user input to select at least a part of the body in the human body model; 3.
- the image processing apparatus according to 2 wherein the part existing in the part of the body selected by the user input is determined as the part to which the calculated feature amount is adopted or the part not to adopt the calculated feature amount. .
- the input means Receiving the user input designating the weight of the feature amount calculated from each of the plurality of human bodies for each part;
- the processing means 2.
- the image processing apparatus Detected from at least one of the plurality of human bodies, or the part not detected from any of the plurality of human bodies or not detected from the human body specified by the user input and for which the integrated feature amount has not been calculated, or 6.
- the image processing apparatus according to any one of 1 to 5, further comprising display means for displaying information identifying the part detected from the human body designated by the user input and for which the integrated feature amount is calculated.
- the display means is A human body model in which a plurality of objects are arranged at the parts of the human body is displayed, and the objects corresponding to the parts for which the integrated feature amount has been calculated and the objects corresponding to the parts for which the integrated feature amount has not been calculated are displayed. 7.
- the image processing device which displays objects so as to be mutually identifiable. 8.
- the display means is 8.
- the image processing apparatus according to 6 or 7, further displaying information for identifying the region where the keypoint is detected and the region where the keypoint is not detected, in association with each of the plurality of human bodies.
- the computer a skeletal structure detection step of performing a process of detecting a plurality of key points corresponding to each of a plurality of parts of the human body included in the image;
- a feature quantity calculation step of calculating a feature quantity for each of the detected key points;
- An image processing method that performs 10.
- skeletal structure detection means for detecting a plurality of key points corresponding to each of a plurality of parts of the human body included in the image
- feature quantity calculation means for calculating a feature quantity of each of the detected key points
- input means for accepting user input specifying a method for integrating the feature amounts of the key points detected from each of a plurality of human bodies for each part
- Processing means for calculating an integrated feature amount for each part by performing integration for each part by the method specified by the user input, and performing image search or image classification based on the integrated feature amount;
- REFERENCE SIGNS LIST 100 image processing device 101 skeleton structure detection unit 102 feature amount calculation unit 103 processing unit 104 storage unit 105 display unit 106 input unit 1A processor 2A memory 3A input/output I/F 4A peripheral circuit 5A bus
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2021/041928 WO2023084780A1 (ja) | 2021-11-15 | 2021-11-15 | 画像処理装置、画像処理方法、およびプログラム |
| US18/708,227 US20250014212A1 (en) | 2021-11-15 | 2021-11-15 | Image processing apparatus, image processing method, and non-transitory storage medium |
| JP2023559386A JP7726291B2 (ja) | 2021-11-15 | 2021-11-15 | 画像処理装置、画像処理方法、およびプログラム |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2021/041928 WO2023084780A1 (ja) | 2021-11-15 | 2021-11-15 | 画像処理装置、画像処理方法、およびプログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023084780A1 true WO2023084780A1 (ja) | 2023-05-19 |
Family
ID=86335447
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2021/041928 Ceased WO2023084780A1 (ja) | 2021-11-15 | 2021-11-15 | 画像処理装置、画像処理方法、およびプログラム |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20250014212A1 (https=) |
| JP (1) | JP7726291B2 (https=) |
| WO (1) | WO2023084780A1 (https=) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119201868A (zh) * | 2024-08-20 | 2024-12-27 | 中移互联网有限公司 | 云盘图片去重方法 |
| JP7646273B1 (ja) * | 2024-12-06 | 2025-03-17 | 株式会社Tomody | 動画撮像装置、動画撮像方法およびプログラム |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2016058078A (ja) * | 2014-09-05 | 2016-04-21 | ザ・ボーイング・カンパニーThe Boeing Company | 連想メモリによって分類されたフレームを使用して位置に対する計量値を取得すること |
| CN109308438A (zh) * | 2017-07-28 | 2019-02-05 | 上海形趣信息科技有限公司 | 动作识别库的建立方法、电子设备、存储介质 |
| JP2019091138A (ja) * | 2017-11-13 | 2019-06-13 | 株式会社日立製作所 | 画像検索装置、画像検索方法、及び、それに用いる設定画面 |
| JP2020135747A (ja) * | 2019-02-25 | 2020-08-31 | 株式会社日立ソリューションズ | 行動分析装置および行動分析方法 |
| JP2020135551A (ja) * | 2019-02-21 | 2020-08-31 | セコム株式会社 | 対象物認識装置、対象物認識方法、及び対象物認識プログラム |
-
2021
- 2021-11-15 WO PCT/JP2021/041928 patent/WO2023084780A1/ja not_active Ceased
- 2021-11-15 US US18/708,227 patent/US20250014212A1/en active Pending
- 2021-11-15 JP JP2023559386A patent/JP7726291B2/ja active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2016058078A (ja) * | 2014-09-05 | 2016-04-21 | ザ・ボーイング・カンパニーThe Boeing Company | 連想メモリによって分類されたフレームを使用して位置に対する計量値を取得すること |
| CN109308438A (zh) * | 2017-07-28 | 2019-02-05 | 上海形趣信息科技有限公司 | 动作识别库的建立方法、电子设备、存储介质 |
| JP2019091138A (ja) * | 2017-11-13 | 2019-06-13 | 株式会社日立製作所 | 画像検索装置、画像検索方法、及び、それに用いる設定画面 |
| JP2020135551A (ja) * | 2019-02-21 | 2020-08-31 | セコム株式会社 | 対象物認識装置、対象物認識方法、及び対象物認識プログラム |
| JP2020135747A (ja) * | 2019-02-25 | 2020-08-31 | 株式会社日立ソリューションズ | 行動分析装置および行動分析方法 |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119201868A (zh) * | 2024-08-20 | 2024-12-27 | 中移互联网有限公司 | 云盘图片去重方法 |
| JP7646273B1 (ja) * | 2024-12-06 | 2025-03-17 | 株式会社Tomody | 動画撮像装置、動画撮像方法およびプログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| US20250014212A1 (en) | 2025-01-09 |
| JPWO2023084780A1 (https=) | 2023-05-19 |
| JP7726291B2 (ja) | 2025-08-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7409499B2 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
| JP7416252B2 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
| JP7775918B2 (ja) | 情報処理装置、情報処理方法、およびプログラム | |
| JP7726291B2 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| JP7806807B2 (ja) | 検索装置、検索方法、およびプログラム | |
| JP7364077B2 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
| JP7708182B2 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| JP7485040B2 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
| JP7635852B2 (ja) | 運転手監視装置、運転手監視方法及びプログラム | |
| JP7435781B2 (ja) | 画像選択装置、画像選択方法、及びプログラム | |
| WO2022079794A1 (ja) | 画像選択装置、画像選択方法、及びプログラム | |
| JP7697545B2 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| JP7845054B2 (ja) | 画像処理システム、装置、処理方法、およびプログラム | |
| JP7589744B2 (ja) | 画像選択装置、画像選択方法、及びプログラム | |
| JP7726290B2 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| JP7743882B2 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| JP7375921B2 (ja) | 画像分類装置、画像分類方法、およびプログラム | |
| WO2022249331A1 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| JP7468642B2 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
| JP7708225B2 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| WO2021229750A1 (ja) | 画像選択装置、画像選択方法、およびプログラム | |
| JP7708226B2 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| JP7302741B2 (ja) | 画像選択装置、画像選択方法、およびプログラム | |
| JP7687434B2 (ja) | 行動分類装置、行動分類方法、およびプログラム | |
| WO2023170744A1 (ja) | 画像処理装置、画像処理方法、および記録媒体 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21964133 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2023559386 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18708227 Country of ref document: US |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21964133 Country of ref document: EP Kind code of ref document: A1 |