WO2023152971A1 - 画像処理装置、画像処理方法、およびプログラム - Google Patents
画像処理装置、画像処理方法、およびプログラム Download PDFInfo
- Publication number
- WO2023152971A1 WO2023152971A1 PCT/JP2022/005675 JP2022005675W WO2023152971A1 WO 2023152971 A1 WO2023152971 A1 WO 2023152971A1 JP 2022005675 W JP2022005675 W JP 2022005675W WO 2023152971 A1 WO2023152971 A1 WO 2023152971A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- human body
- quality value
- image
- image processing
- detected
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
Definitions
- the present invention relates to an image processing device, an image processing method, and a program.
- Patent Document 1 discloses Technologies related to the present invention.
- Japanese Patent Laid-Open No. 2002-200000 describes a method of calculating a feature amount for each of a plurality of key points of a human body included in an image, and retrieving an image containing a human body with a similar posture or a similar movement based on the calculated feature amount.
- Techniques for grouping and classifying objects having similar postures and movements are disclosed.
- Non-Patent Document 1 discloses a technique related to human skeleton estimation.
- Patent Document 1 by registering an image including a human body in a desired posture and desired movement as a template image in advance, a desired posture and a desired motion can be obtained from images to be processed. The movement of the human body can be detected.
- the inventors of the present invention have found that the accuracy of detection deteriorates unless an image of a certain quality is registered as a template image, and that such a template image We have newly found that there is room for improvement in the workability of the work of preparing the.
- Patent Document 1 and Non-Patent Document 1 mentioned above do not disclose the problem related to the template image and its solution, so there was a problem that the above problem could not be solved.
- One example of the object of the present invention is to provide an image processing device, an image processing method, and a program that solve the workability problem of preparing a template image of a certain quality in view of the above-mentioned problems.
- skeletal structure detection means for detecting key points of the human body included in the image; calculating means for calculating the quality value of the detected keypoint for each human body; an output means for outputting information indicating a portion where the human body whose quality value is equal to or greater than a threshold value is captured, or a partial image obtained by cutting out the portion from the image; is provided.
- one or more computers Perform processing to detect key points of the human body included in the image, calculating the quality value of the detected key points for each human body; outputting information indicating a location where the human body whose quality value is equal to or higher than a threshold is captured, or a partial image obtained by cutting out the location from the image; An image processing method is provided.
- the computer skeletal structure detection means for detecting key points of the human body included in the image; calculating means for calculating the quality value of the detected key points for each human body; output means for outputting information indicating a portion where a human body is captured, the quality value of which is equal to or greater than a threshold, or a partial image obtained by cutting out the portion from the image;
- a program is provided to act as a
- an image processing device an image processing method, and a program that solve the workability problem of preparing a template image of constant quality are obtained.
- FIG. 4 is a diagram schematically showing an example of information output by an image processing device; 4 is a flow chart showing an example of the flow of processing of the image processing apparatus;
- FIG. 1 is a functional block diagram showing an overview of an image processing apparatus 10 according to the first embodiment.
- the image processing device 10 includes a skeleton structure detection section 11 , a calculation section 12 and an output section 13 .
- the skeletal structure detection unit 11 performs processing for detecting key points of the human body included in the image.
- the calculator 12 calculates the quality value of the detected keypoint for each human body.
- the output unit 13 outputs information indicating a portion in which a human body whose quality value is equal to or higher than a threshold value is captured, or a partial image obtained by cutting out the portion from the image.
- this image processing apparatus 10 it is possible to solve the workability problem of preparing a template image of a certain quality.
- ⁇ Second embodiment> "overview"
- the image processing apparatus 10 calculates the quality value of the detected keypoint for each detected human body based on the certainty of the keypoint detection result. Then, the image processing apparatus 10 outputs information indicating a portion in which a human body whose quality value is equal to or higher than the threshold value is captured, or a partial image obtained by cutting out the portion from the image.
- the user can prepare a template image of a certain quality by selecting template images from the parts in which the human body whose quality value is equal to or higher than the threshold is shown.
- Each functional unit of the image processing apparatus includes a CPU (Central Processing Unit) of any computer, a memory, a program loaded into the memory, and a storage unit such as a hard disk for storing the program (previously stored from the stage of shipment of the apparatus). It can also store programs downloaded from storage media such as CDs (Compact Discs) and servers on the Internet, etc.), and is realized by any combination of hardware and software centered on the interface for network connection. be. It should be understood by those skilled in the art that there are various modifications to the implementation method and apparatus.
- FIG. 2 is a block diagram illustrating the hardware configuration of the image processing device 10.
- the image processing apparatus 10 has a processor 1A, a memory 2A, an input/output interface 3A, a peripheral circuit 4A and a bus 5A.
- the peripheral circuit 4A includes various modules.
- the image processing device 10 may not have the peripheral circuit 4A.
- the image processing apparatus 10 may be composed of a plurality of physically and/or logically separated devices. In this case, each of the plurality of devices can have the above hardware configuration.
- the bus 5A is a data transmission path for mutually transmitting and receiving data between the processor 1A, the memory 2A, the peripheral circuit 4A and the input/output interface 3A.
- the processor 1A is, for example, an arithmetic processing device such as a CPU or a GPU (Graphics Processing Unit).
- the memory 2A is, for example, RAM (Random Access Memory) or ROM (Read Only Memory).
- the input/output interface 3A includes an interface for acquiring information from an input device, an external device, an external server, an external sensor, a camera, etc., an interface for outputting information to an output device, an external device, an external server, etc. .
- Input devices are, for example, keyboards, mice, microphones, physical buttons, touch panels, and the like.
- the output device is, for example, a display, speaker, printer, mailer, or the like.
- the processor 1A can issue commands to each module and perform calculations based on the calculation results thereof.
- FIG. 1 is a functional block diagram showing an overview of an image processing apparatus 10 according to the second embodiment.
- the image processing device 10 includes a skeleton structure detection section 11 , a calculation section 12 and an output section 13 .
- the skeletal structure detection unit 11 performs processing to detect key points of the human body included in the image.
- Image is the original image of the template image.
- a template image is an image that is registered in advance in the technology disclosed in Patent Document 1 described above, and is an image that includes a human body in a desired posture and desired movement (posture and movement that the user wants to detect).
- the image may be a moving image composed of a plurality of frame images, or may be a single still image.
- the skeletal structure detection unit 11 detects N (N is an integer equal to or greater than 2) keypoints of the human body included in the image. When moving images are to be processed, the skeletal structure detection unit 11 performs processing to detect key points for each frame image.
- the processing by the skeletal structure detection unit 11 is realized using the technique disclosed in Japanese Patent Application Laid-Open No. 2002-200012. Although the details are omitted, the technique disclosed in Patent Document 1 detects the skeleton structure using the skeleton estimation technique such as OpenPose disclosed in Non-Patent Document 1.
- the skeletal structure detected by this technique consists of "keypoints", which are characteristic points such as joints, and "bones (bone links)", which indicate links between keypoints.
- FIG. 3 shows the skeletal structure of the human body model 300 detected by the skeletal structure detection unit 11, and FIGS. 4 and 5 show detection examples of the skeletal structure.
- the skeletal structure detection unit 11 detects the skeletal structure of a human body model (two-dimensional skeletal model) 300 as shown in FIG.
- the human body model 300 is a two-dimensional model composed of key points such as human joints and bones connecting the key points.
- the skeletal structure detection unit 11 extracts feature points that can be keypoints from the image, refers to information obtained by machine learning the image of the keypoints, and detects N keypoints of the human body.
- the N keypoints to detect are predetermined.
- the number of keypoints to be detected that is, the number of N
- which parts of the human body are to be detected as keypoints are various, and all variations can be adopted.
- head A1, neck A2, right shoulder A31, left shoulder A32, right elbow A41, left elbow A42, right hand A51, left hand A52, right hip A61, left hip A62, right knee A71, left Assume that the knee A72, the right foot A81, and the left foot A82 are defined as N keypoints (N 14) to be detected.
- the human bones connecting these key points are bone B1 connecting head A1 and neck A2, bone B21 and bone B22 connecting neck A2 and right shoulder A31 and left shoulder A32, respectively.
- FIG. 4 is an example of detecting a person standing upright.
- an upright person is imaged from the front, and bones B1, B51 and B52, B61 and B62, and B71 and B72 viewed from the front are detected without overlapping each other.
- the bones B61 and B71 are slightly more bent than the left leg bones B62 and B72.
- Fig. 5 is an example of detecting a crouching person.
- a crouching person is imaged from the right side, and bones B1, B51 and B52, B61 and B62, and B71 and B72 are detected from the right side, and the right leg bone B61 is detected. And the bone B71 and the bones B62 and B72 of the left leg are greatly bent and overlapped.
- the calculation unit 12 calculates the quality value of the detected keypoints for each human body. Then, the calculation unit 12 identifies a portion in the image in which the human body is captured, where the quality value of the detected keypoint is equal to or greater than the threshold.
- the calculator 12 calculates the quality value of the detected keypoint.
- the "quality value of the detected keypoint" is a value indicating how good the quality of the detected keypoint is, and can be calculated based on various data.
- the calculation unit 12 calculates the quality value based on the certainty of the keypoint detection result.
- an example of calculating the quality value based on data other than the certainty factor of the keypoint detection result will be described.
- There is no particular restriction on the method of calculating the certainty For example, in a skeleton estimation technique such as OpenPose, a score output in association with each detected keypoint may be used as the certainty of each keypoint.
- the calculation unit 12 calculates a higher quality value as the certainty of the keypoint detection result is higher. For example, the calculation unit 12 calculates the statistical values (mean value, maximum value, minimum value, median value, mode value, weighted average value, etc.) of the confidence of each of the N keypoints detected from each human body as a quality value. may be calculated as If some of the N keypoints are not detected, the confidence of the undetected keypoints may be a fixed value such as "0". This fixed value is lower than the certainty of the detected keypoint.
- the calculation unit 12 calculates the quality value for each human body detected from the still image.
- the calculation unit 12 calculates the quality value for each human body detected from each of the plurality of frame images.
- the calculation unit 12 identifies a location in the image in which the human body is captured, where the quality value of the detected keypoint is equal to or greater than the threshold value, based on the calculation result of the process of calculating the quality value described above.
- the calculation unit 12 determines whether the quality value of the detected keypoint is equal to or greater than a threshold for each detected human body. Then, the calculation unit 12 identifies a portion where a human body whose quality value is equal to or higher than the threshold value is photographed according to the determination result.
- the "portion where the human body whose quality value is greater than or equal to the threshold" is a partial area within one still image.
- the coordinates of the coordinate system set in the image indicate the location in the image where the quality value of the detected keypoint is greater than or equal to the threshold.
- the "portion where the human body is captured and whose quality value is equal to or greater than the threshold" is a partial area within each of the plurality of frame images that constitute the moving image. .
- a portion in the image in which a human body is shown where the quality value of is equal to or greater than the threshold is indicated.
- the image is a moving image
- the condition that "the human body of the same person is continuously captured and that the quality value of the keypoint detected from the human body is equal to or greater than the threshold value" is satisfied. It is preferable to specify "the place where the human body is photographed”.
- the calculation unit 12 may identify the human body of the same person appearing across a plurality of frame images.
- a method for realizing the identification is not particularly limited. For example, by using human tracking technology, face recognition technology, etc., the same person captured across multiple frame images can be specified, and the human body detected at the position within each of the multiple frame images in which the same person is captured can be identified. , may be specified as the human body of the same person. Through this process, the calculation unit 12 can identify a plurality of frame images in which the human body of the same person is continuously captured.
- the condition that "the quality value of the keypoint detected from the human body is greater than or equal to the threshold" may require that all of the multiple frame images satisfy the condition. That is, the calculation unit 12 identifies a plurality of frame images in which the human body of the same person is continuously captured and in which the quality value of the keypoint detected from the human body is equal to or greater than the threshold in all the frame images. You may
- the above condition may require that at least part of the plurality of frame images satisfy the above condition. That is, the calculation unit 12 calculates a plurality of frame images in which the human body of the same person is continuously captured and the quality value of the keypoint detected from the human body in at least some of the frame images is equal to or greater than the threshold. may be specified.
- the condition for the plurality of frame images "the number of consecutive frame images showing a human body whose quality value is less than a threshold is Q or less" or the like may be added. By adding such an additional condition, it is possible to suppress the inconvenience that a portion where a human body with a low quality value appears consecutively for a predetermined number of frames or more is specified as a template image candidate.
- the output unit 13 outputs information indicating a part where a human body whose quality value is equal to or higher than a threshold (a human body whose quality value of a detected keypoint is above the threshold) appears, or a partial image obtained by cutting out the relevant part from the image. do.
- the output unit 13 outputs a plurality of frame images that continuously show the human body of the same person and satisfy the condition that "the quality value of the keypoint detected from the human body is equal to or greater than the threshold".
- Information indicating the location where the human body appears in each image, or a partial image obtained by cutting out the location from the image may be output.
- the image processing device 10 can have a processing unit that generates a partial image by extracting from the image a portion in which a human body whose quality value is equal to or greater than a threshold value is captured.
- the output unit 13 can output the partial image generated by the processing unit.
- “Locations in which a human body whose quality value is greater than or equal to the threshold” are candidates for the template image. Based on the above information or the partial image, the user can browse locations where a human body whose quality value is greater than or equal to a threshold value is captured, and can select a location including a human body with a desired posture and desired movement as a template image. .
- FIG. 6 An example of the information output by the output unit 13 is schematically shown in FIG.
- human body identification information for mutually identifying a plurality of detected human bodies and attribute information of each human body are displayed in association with each other.
- the attribute information the quality value, information indicating the location in the image (information indicating the location where the human body is shown), and the date and time when the image was taken are displayed.
- Attribute information also includes information indicating the installation position (shooting position) of the camera that shot the image (e.g., the back of the bus No. 102, the entrance to XX park, etc.), and the attribute information of the person calculated by image analysis (e.g., : sex, age group, body type, etc.) may be included.
- the image processing device 10 When the original image of the template image is input to the image processing device 10, the image processing device 10 performs processing for detecting key points of the human body included in the image (S10). Next, the image processing apparatus 10 calculates quality values of the detected keypoints for each detected human body (S11). Next, the image processing apparatus 10 determines whether the quality value of the detected keypoint is equal to or greater than a threshold for each detected human body (S12). Next, the image processing apparatus 10 identifies a portion where a human body whose quality value is equal to or higher than the threshold value is captured according to the determination result of S12 (S13). Then, the image processing apparatus 10 outputs information indicating a portion in which a human body whose quality value is equal to or higher than the threshold value is captured, or a partial image obtained by cutting out the portion from the image (S14).
- the image processing apparatus 10 of the third embodiment differs from those of the first and second embodiments in the method of calculating the quality value.
- the calculation unit 12 calculates the quality value of a human body with a relatively large number of detected keypoints higher than the quality value of a human body with a relatively small number of detected keypoints. For example, the calculation unit 12 may use the number of detected keypoints as the quality value. In addition, a weighting point may be set for each of a plurality of keypoints. A higher weighting point is set for a relatively more important keypoint. Then, the calculation unit 12 may calculate a value obtained by adding the weighting points of the detected keypoints as the quality value.
- the calculation unit 12 may calculate the quality value by combining the method described in the second embodiment and the method based on the number of detected keypoints. For example, the calculation unit 12 normalizes the quality value calculated by the method described in the second embodiment according to a predetermined rule to calculate the first quality value, and the method based on the number of detected keypoints. The second quality value is calculated by normalizing the quality value calculated in (1) according to a predetermined rule. Then, the calculation unit 12 uses the statistical values (average value, maximum value, minimum value, median value, mode value, weighted average value, etc.) of the first quality value and the second quality value as human body quality values. can be calculated.
- the statistical values average value, maximum value, minimum value, median value, mode value, weighted average value, etc.
- the image processing apparatus 10 of the third embodiment effects similar to those of the first and second embodiments are achieved. Further, according to the image processing apparatus 10 of the third embodiment, it is possible to present to the user, as a template image candidate, a portion of the human body in which many key points are detected. The user can easily prepare a template image in which the number of detected keypoints satisfies a certain level of quality by selecting the template image from among the presented template image candidates.
- the image processing apparatus 10 of the fourth embodiment differs from those of the first to third embodiments in the method of calculating the quality value.
- the calculation unit 12 calculates the quality value based on the degree of overlap with other human bodies.
- a state in which the human body of person A overlaps that of person B is a state in which the human body of person A is partially or wholly hidden by the human body of person B, and that the human body of person A overlaps the human body of person B. includes a state in which part or all of is hidden, and a state in which both occur.
- the calculation method will be specifically described below.
- the calculator 12 calculates the quality value of a human body that does not overlap with other human bodies to be higher than the quality value of a human body that overlaps with other human bodies.
- a rule is created in advance and stored in the image processing apparatus 10 so that the quality value of a human body that does not overlap with another human body is set to X1 , and the quality value of a human body that overlaps with another human body is set to X2 .
- X 1 >X 2 .
- the calculation unit 12 calculates the quality value of the human body that does not overlap with other human bodies as X1 , and the quality value of the human body that overlaps with other human bodies as X2 .
- the output unit 13 can output information indicating a portion where a human body whose quality value is Y or higher, or a partial image obtained by cutting out the portion from the image. Note that X 1 >Y>X 2 .
- Whether or not the human body overlaps with another human body may be identified based on the degree of overlap of the human body model 300 (see FIG. 3) detected by the skeletal structure detection unit 11, or may be identified based on the degree of overlap of the body captured in the image. may
- the threshold may be a variable value that varies depending on the size of the detected human body within the image. The larger the size of the detected human body in the image, the larger the threshold.
- the length of a predetermined bone eg, bone B1 connecting head A1 and neck A2 or the size of the face in the image may be used.
- any bone of a certain human body intersects with any bone of another human body, it may be determined that the two human bodies overlap each other.
- the calculation unit 12 calculates the quality value of the human body that does not overlap with other human bodies to be higher than the quality value of the human body that overlaps with other human bodies, and calculates the quality value of the human body that overlaps with other human bodies, and calculates the quality value of the human body that overlaps with other human bodies.
- the quality value of the human body positioned on the rear side is calculated to be higher than the quality value of the human body positioned on the rear side.
- the calculation unit 12 calculates the highest quality value of the human body that does not overlap with other human bodies, calculates the second highest quality value of the human body that overlaps with the other human body but is positioned in front of the other human body, The lowest quality value is calculated for the human body that overlaps with and is located on the back side.
- X1 be the quality value of a human body that does not overlap other human bodies
- X21 be the quality value of a human body that overlaps another human body and is located in front
- X21 be the quality value of a human body that overlaps another human body and is located in the back.
- a rule for the quality value X 22 of is created in advance and stored in the image processing apparatus 10 . Note that X 1 >X 21 >X 22 . Then, based on the rule, the calculation unit 12 calculates the quality value of the human body that does not overlap with other human bodies as X1 , and the quality value of the human body that overlaps with the other human body and is located on the front side as X21 .
- the output unit 13 can output information indicating a portion where a human body whose quality value is Z or higher, or a partial image obtained by cutting out the portion from the image. Note that X 1 >X 21 >Z>X 22 or X 1 >Z>X 21 >X 22 .
- Whether the human body is positioned in front or behind the other human body may be specified based on the degree of hiding or lacking of the human body model 300 (see FIG. 3) detected by the skeletal structure detection unit 11, or may be determined based on the extent of the body in the image. It may be specified based on the degree of hiding. For example, if all N keypoints are detected in one of the two bodies overlapping each other, and only some of the N keypoints are detected in the other, then all N keypoints It can be determined that the detected human body is located on the front side and the other human body is located on the rear side.
- the calculation unit 12 may calculate the quality value by combining at least one of the methods described in the second and third embodiments and the method based on the degree of overlap with the other human body. For example, the calculation unit 12 normalizes the quality value calculated by the method described in the second embodiment according to a predetermined rule to calculate the first quality value, and the method described in the third embodiment. At least one of processing for calculating a second quality value by normalizing the calculated quality value according to a predetermined rule is performed. Further, the calculation unit 12 normalizes the quality value calculated by the method based on the degree of overlap with the other human body according to a predetermined rule to calculate a third quality value.
- the calculation unit 12 calculates statistical values (average value, maximum value, minimum value, median value, mode value, weighted average value, etc.) of at least one of the first and second quality values and the third quality value. ) may be calculated as the quality value of the human body.
- the image processing apparatus 10 of the fourth embodiment effects similar to those of the first to third embodiments are achieved. Further, according to the image processing apparatus 10 of the fourth embodiment, it is possible to present to the user, as template image candidates, portions in which a human body that does not overlap another human body is captured. Further, according to the image processing apparatus 10 of the fourth embodiment, in addition to the portion where the human body that does not overlap with other human bodies is captured, the portion where the human body that overlaps with the other human body but is located in front is captured as a template. Can be presented to the user as image candidates. By selecting a template image from among the template image candidates presented in this way, the user can easily prepare a template image that satisfies a certain quality in terms of the degree of overlap with other human bodies.
- the image processing apparatus 10 of the fifth embodiment differs from the first to fourth embodiments in the method of calculating the quality value.
- the skeletal structure detection unit 11 performs a process of detecting a human region within an image and detecting key points within the detected human region. That is, the skeletal structure detection unit 11 does not subject all regions in the image to the process of detecting keypoints, but subjects only the detected human region to the process of detecting keypoints.
- the details of processing for detecting a person region in an image are not particularly limited, and may be implemented using object detection technology such as YOLO, for example.
- the calculation unit 12 calculates a quality value based on the certainty of the human region detection result.
- a quality value based on the certainty of the human region detection result.
- the method of calculating the degree of certainty of the human region detection result For example, in an object detection technique such as YOLO, a score (also referred to as reliability or the like) output in association with a detected object region may be used as the confidence of each person region.
- the calculation unit 12 calculates a higher quality value as the degree of certainty of the human region detection result is higher.
- the calculation unit 12 may calculate the certainty of the human region detection result as the quality value.
- the calculation unit 12 may calculate the quality value by combining at least one of the methods described in the second to fourth embodiments and a method based on the degree of certainty of the human region detection result. For example, the calculation unit 12 normalizes the quality value calculated by the method described in the second embodiment according to a predetermined rule to calculate the first quality value, and calculates the first quality value by the method described in the third embodiment. A second quality value is calculated by normalizing the obtained quality value according to a predetermined rule, and a third quality value is calculated by normalizing the quality value calculated by the method described in the fourth embodiment according to a predetermined rule. At least one of the calculation processes is performed.
- the calculation unit 12 normalizes the quality value calculated by the method based on the degree of certainty of the human region detection result according to a predetermined rule to calculate a fourth quality value. Then, the calculation unit 12 calculates statistical values (average value, maximum value, minimum value, median value, mode value, weighted average value, etc.) of at least one of the first to third quality values and the fourth quality value. ) may be calculated as the quality value of the human body.
- the image processing apparatus 10 of the fifth embodiment effects similar to those of the first to fourth embodiments are achieved. Further, according to the image processing apparatus 10 of the fifth embodiment, it is possible to present to the user, as a template image candidate, a portion in which a person is captured with a high degree of certainty. By selecting a template image from among the template image candidates presented in this way, the user can easily prepare a template image that satisfies a certain level of quality in the human region detection result.
- the image processing apparatus 10 of the sixth embodiment differs from the first to fifth embodiments in the method of calculating the quality value.
- the calculation unit 12 calculates the quality value based on the size of the human body on the image.
- the calculation unit 12 calculates the quality value of a relatively large human body to be higher than the quality value of a relatively small human body.
- the size of the human body on the image may be indicated by the size (area, etc.) of the human region shown in the fifth embodiment, or by the length of a predetermined bone (eg, bone B1). Alternatively, it may be indicated by the length between two predetermined key points (eg, key points A31 and A32), or may be indicated by other methods.
- the calculation unit 12 may calculate the quality value by combining at least one of the methods described in the second to fifth embodiments and a method based on the size of the human body on the image. For example, the calculation unit 12 normalizes the quality value calculated by the method described in the second embodiment according to a predetermined rule to calculate the first quality value, and calculates the first quality value by the method described in the third embodiment. A second quality value is calculated by normalizing the obtained quality value according to a predetermined rule, and a third quality value is calculated by normalizing the quality value calculated by the method described in the fourth embodiment according to a predetermined rule. and normalizing the quality value calculated by the method described in the fifth embodiment according to a predetermined rule to calculate a fourth quality value.
- the calculation unit 12 normalizes the quality value calculated by the method based on the size of the human body on the image according to a predetermined rule to calculate a fifth quality value. Then, the calculation unit 12 calculates statistical values (average value, maximum value, minimum value, median value, mode value, weighted average value, etc.) of at least one of the first to fourth quality values and the fifth quality value. ) may be calculated as the quality value of the human body.
- the image processing apparatus 10 of the sixth embodiment effects similar to those of the first to fifth embodiments are realized.
- the user can easily prepare a template image that satisfies a certain quality of the size of the human body by selecting the template image from among the presented template image candidates.
- ⁇ Modification 1> When a plurality of images of the same person photographed simultaneously by a plurality of cameras are input to the image processing device 10, the quality value of the key point that detects the human body of the same person detected from each of the plurality of images is equal to or greater than the threshold, the output unit 13 outputs information indicating the location where the human body with the highest quality value among the human body of the same person detected from each of the plurality of images is captured, or cuts out that location from the image. A partial image may be output.
- image identification information is included in the "information indicating a location where a human body whose quality value is equal to or greater than the threshold" is included.
- the output unit 13 outputs information indicating such a portion and a partial image obtained by cutting out such a portion from the image.
- This configuration assumes that one frame image can include a plurality of human bodies.
- the part where the human body whose quality value is equal to or higher than the threshold may be part of the plurality of frame images that make up the moving image. Then, the output unit 13 may output information indicating a part of such a plurality of frame images, or a partial image obtained by cutting out a part of the frame images from the image. Also, a frame image itself showing a human body whose quality value is equal to or higher than a threshold value may be output as a template image candidate. This configuration assumes that one frame image can include only one human body whose quality value is equal to or higher than the threshold.
- skeletal structure detection means for detecting key points of the human body included in the image; calculating means for calculating the quality value of the detected keypoint for each human body; an output means for outputting information indicating a portion where the human body whose quality value is equal to or greater than a threshold value is captured, or a partial image obtained by cutting out the portion from the image;
- An image processing device having 2.
- the skeletal structure detection means detects a person region in the image, and performs a process of detecting the key point in the detected person region, 3.
- the image processing apparatus according to 1 or 2, wherein the calculation means calculates the quality value based on the certainty of the human region detection result. 4. 4. The image processing apparatus according to any one of 1 to 3, wherein the calculating means calculates the quality value based on a degree of overlap with another human body. 5. 5. The image processing apparatus according to 4, wherein the calculating means calculates the quality value of a human body that does not overlap with another human body to be higher than the quality value of a human body that overlaps with another human body. 6. 6. The image processing apparatus according to 5, wherein the calculating means calculates the quality value of a human body located on the front side among the human bodies overlapping other human bodies to be higher than the quality value of the human body located on the rear side. 7.
- the calculating means calculates the quality value of a human body with a relatively large number of the detected keypoints higher than the quality value of a human body with a relatively small number of the detected keypoints; 7.
- the image processing device according to any one of 6.
- the calculating means calculates the quality value based on the size of the human body on the image.
- 9. one or more computers Perform processing to detect key points of the human body included in the image, calculating the quality value of the detected key points for each human body; outputting information indicating a location where the human body whose quality value is equal to or higher than a threshold is captured, or a partial image obtained by cutting out the location from the image; Image processing method. 10.
- skeletal structure detection means for detecting key points of the human body included in the image; calculating means for calculating the quality value of the detected key points for each human body; output means for outputting information indicating a portion where a human body is captured, the quality value of which is equal to or greater than a threshold, or a partial image obtained by cutting out the portion from the image; program to act as
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2022/005675 WO2023152971A1 (ja) | 2022-02-14 | 2022-02-14 | 画像処理装置、画像処理方法、およびプログラム |
| JP2023580041A JP7708225B2 (ja) | 2022-02-14 | 2022-02-14 | 画像処理装置、画像処理方法、およびプログラム |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2022/005675 WO2023152971A1 (ja) | 2022-02-14 | 2022-02-14 | 画像処理装置、画像処理方法、およびプログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023152971A1 true WO2023152971A1 (ja) | 2023-08-17 |
Family
ID=87564097
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2022/005675 Ceased WO2023152971A1 (ja) | 2022-02-14 | 2022-02-14 | 画像処理装置、画像処理方法、およびプログラム |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP7708225B2 (https=) |
| WO (1) | WO2023152971A1 (https=) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2021135877A (ja) * | 2020-02-28 | 2021-09-13 | Kddi株式会社 | 骨格追跡方法、装置およびプログラム |
| WO2021229751A1 (ja) * | 2020-05-14 | 2021-11-18 | 日本電気株式会社 | 画像選択装置、画像選択方法、およびプログラム |
| WO2021250808A1 (ja) * | 2020-06-10 | 2021-12-16 | 日本電気株式会社 | 画像処理装置、画像処理方法、及びプログラム |
-
2022
- 2022-02-14 JP JP2023580041A patent/JP7708225B2/ja active Active
- 2022-02-14 WO PCT/JP2022/005675 patent/WO2023152971A1/ja not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2021135877A (ja) * | 2020-02-28 | 2021-09-13 | Kddi株式会社 | 骨格追跡方法、装置およびプログラム |
| WO2021229751A1 (ja) * | 2020-05-14 | 2021-11-18 | 日本電気株式会社 | 画像選択装置、画像選択方法、およびプログラム |
| WO2021250808A1 (ja) * | 2020-06-10 | 2021-12-16 | 日本電気株式会社 | 画像処理装置、画像処理方法、及びプログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2023152971A1 (https=) | 2023-08-17 |
| JP7708225B2 (ja) | 2025-07-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9002099B2 (en) | Learning-based estimation of hand and finger pose | |
| JP7409499B2 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
| JP7416252B2 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
| JP2023504319A (ja) | 人体と人手を関連付ける方法、装置、機器及び記憶媒体 | |
| Yen et al. | Adaptive indoor people-counting system based on edge ai computing | |
| JP7485040B2 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
| JP7708182B2 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| WO2023084780A1 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| JP7364077B2 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
| JP7435781B2 (ja) | 画像選択装置、画像選択方法、及びプログラム | |
| JP7658380B2 (ja) | 画像選択装置、画像選択方法、及びプログラム | |
| WO2023089690A1 (ja) | 検索装置、検索方法、およびプログラム | |
| JP7501622B2 (ja) | 画像選択装置、画像選択方法、およびプログラム | |
| JP7435754B2 (ja) | 画像選択装置、画像選択方法、及びプログラム | |
| JP7697545B2 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| JP7708225B2 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| WO2023152974A1 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| JP7708226B2 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| JP7632608B2 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| JP7589744B2 (ja) | 画像選択装置、画像選択方法、及びプログラム | |
| Yousefi et al. | 3D hand gesture analysis through a real-time gesture search engine | |
| JP7501621B2 (ja) | 画像選択装置、画像選択方法、およびプログラム | |
| JP7375921B2 (ja) | 画像分類装置、画像分類方法、およびプログラム | |
| JP7468642B2 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
| JP2018156544A (ja) | 情報処理装置及びプログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22925990 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18833960 Country of ref document: US |
|
| ENP | Entry into the national phase |
Ref document number: 2023580041 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22925990 Country of ref document: EP Kind code of ref document: A1 |