WO2018135326A1 - Image processing device, image processing system, image processing program, and image processing method - Google Patents

Image processing device, image processing system, image processing program, and image processing method Download PDF

Info

Publication number
WO2018135326A1
WO2018135326A1 PCT/JP2018/000120 JP2018000120W WO2018135326A1 WO 2018135326 A1 WO2018135326 A1 WO 2018135326A1 JP 2018000120 W JP2018000120 W JP 2018000120W WO 2018135326 A1 WO2018135326 A1 WO 2018135326A1
Authority
WO
WIPO (PCT)
Prior art keywords
line segment
hand
region
image processing
line
Prior art date
Application number
PCT/JP2018/000120
Other languages
French (fr)
Japanese (ja)
Inventor
源太 鈴木
Original Assignee
富士通株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士通株式会社 filed Critical 富士通株式会社
Publication of WO2018135326A1 publication Critical patent/WO2018135326A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Definitions

  • the present invention relates to an image processing apparatus, an image processing system, an image processing program, and an image processing method.
  • Patent Document 1 In recent years, with the widespread use of three-dimensional distance sensors, many techniques have been developed that detect the user's skeleton, joint positions, and the like from distance images and recognize gestures and other actions based on the detection results (for example, Patent Document 1). And Patent Document 2).
  • the distance sensor and the distance image may be referred to as a depth sensor and a depth image, respectively.
  • a method of thinning a line of a binarized image, a method of converting continuous points into a plurality of approximate line segments, a method of obtaining an angle composed of three points, a prescribed value of the size of the human body, etc. are also known (for example, Non-Patent Document 1 to Non-Patent Document 4).
  • an object of the present invention is to accurately identify the position of a hand from a distance image of a hand holding an object.
  • the image processing apparatus includes a detection unit, an extraction unit, and a specification unit.
  • the detection unit detects a candidate area where a hand is captured from the distance image.
  • the extraction unit extracts a line segment that approximates the hand region from a plurality of line segments that approximate the candidate region, based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region.
  • the specifying unit specifies the position of the hand in the candidate area using a line segment that approximates the hand area.
  • the position of the hand can be accurately identified from the distance image of the hand holding the object.
  • FIG. 1 shows an example of a three-dimensional distance sensor installed in an assembly line.
  • the worker 101 performs product assembly work while holding an object 103 such as a tool or a part on the work table 102.
  • the field of view 112 of the three-dimensional distance sensor 111 installed above the work table 102 includes the hand of the worker 101, and the three-dimensional distance sensor 111 obtains a distance image showing the hand during work. can do.
  • the difference between the distance value of the hand and the distance value of the object 103 in the distance image is obtained. Get smaller. For this reason, it is difficult to separate the hand and the object 103 by the method of separating the foreground and the background based on the distance value. Even if the hand region is detected based on the difference between the captured distance image and the background distance image of only the background, both the hand and the object 103 are detected as the background difference, and therefore both are separated. It is difficult.
  • a skeleton of a user's body part is recognized by tracking using machine learning, and a tool held by the user is recognized by tracking in a passive method or active method.
  • passive tracking an infrared retro-reflective marker is attached to a tool, and the tool is recognized by detecting the marker with an external device such as a camera.
  • active tracking a three-dimensional position sensor, an acceleration sensor, and the like are built in the tool, and these sensors notify the tracking system of the position of the tool.
  • FIG. 2 shows a functional configuration example of the image processing apparatus.
  • the image processing apparatus 201 in FIG. 2 includes a detection unit 211, an extraction unit 212, and a specification unit 213.
  • FIG. 3 is a flowchart illustrating an example of image processing performed by the image processing apparatus 201 in FIG.
  • the detection unit 211 detects a candidate area where a hand is shown from the distance image (step 301).
  • the extraction unit 212 selects a hand region from a plurality of line segments that approximate the candidate region based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region.
  • An approximate line segment is extracted (step 302).
  • the specifying unit 213 specifies the position of the hand in the candidate area using a line segment that approximates the hand area (step 303).
  • the position of the hand can be accurately identified from the distance image of the hand holding the object.
  • FIG. 4 shows a first specific example of the image processing apparatus 201 of FIG.
  • the image processing apparatus 201 in FIG. 4 includes a detection unit 211, an extraction unit 212, a specification unit 213, an output unit 411, and a storage unit 412.
  • the extraction unit 212 includes a line segment detection unit 421 and a determination unit 422.
  • the imaging device 401 is an example of a three-dimensional distance sensor.
  • the imaging device 401 captures a distance image 432 including a pixel value representing the distance from the imaging device 401 to the subject and outputs the captured image to the image processing device 201.
  • an infrared camera can be used as the imaging device 401.
  • FIG. 5 shows an example of the field of view range of the imaging apparatus 401.
  • the imaging device 401 in FIG. 5 is installed above the workbench 102, and the visual field range 501 includes the object 103 held by the workbench 102, the left hand 502 and the right hand 503 of the worker 101, and the right hand 503. It is included. Therefore, the work table 102, the left hand 502, the right hand 503, and the object 103 are shown in the distance image 432 captured by the imaging device 401.
  • the storage unit 412 stores a background distance image 431 and a distance image 432.
  • the background distance image 431 is a distance image captured in advance in a state where the left hand 502, the right hand 503, and the object 103 are not included in the visual field range 501.
  • the detecting unit 211 detects a candidate area from the distance image 432 using the background distance image 431 and stores area information 433 indicating the detected candidate area in the storage unit 412.
  • the candidate area is an area in which at least one of the left hand 502 or the right hand 503 of the worker 101 is estimated to be captured. For example, an area in contact with the end of the distance image 432 is detected as a candidate area.
  • the line segment detection unit 421 obtains a curve by thinning the candidate area indicated by the area information 433, and obtains a plurality of connected line segments that approximate the curve as a plurality of line segments that approximate the candidate area. Then, the line segment detection unit 421 stores line segment information 434 indicating those line segments in the storage unit 412.
  • the determination unit 422 obtains the angle of the line segment for each combination of two adjacent line segments included in the plurality of line segments indicated by the line segment information 434, and the line segment in the three-dimensional space corresponding to each line segment. Find the length of. Then, the determination unit 422 determines whether each line segment corresponds to the hand area using the obtained angle and length, and approximates the hand area from the plurality of line segments indicated by the line segment information 434. Extract line segments.
  • the determination unit 422 selects another line segment whose distance to the end of the distance image 432 is farther than those line segments. Exclude the region from the approximate line segments.
  • ⁇ min may be a threshold value representing the lower limit value of the bending angle of the wrist or elbow joint.
  • the determination unit 422 is far from the end of the distance image 432 among those line segments. It is determined whether or not the other line segment is excluded from the line segment candidates that approximate the hand region. For this determination, the length of the line segment in the three-dimensional space corresponding to the line segment closer to the end of the distance image 432 is used.
  • the determination unit 422 determines the line based on the distance from each of a plurality of points on the line segment that is the farthest to the end of the distance image 432 from the remaining line segments to the contour of the candidate region. Extract the part that approximates the hand area from the minutes. Then, the determination unit 422 stores hand region line segment information 435 indicating the extracted portion in the storage unit 412.
  • the specifying unit 213 uses the contour of the candidate area indicated by the area information 433 and the line segment indicated by the hand area line segment information 435 to specify the position of the hand in the candidate area, and indicates the specified position.
  • Information 436 is stored in the storage unit 412.
  • the output unit 411 outputs a recognition result based on the position information 436.
  • the output unit 411 may be a display unit that displays the recognition result on the screen, or may be a transmission unit that transmits the recognition result to another image processing apparatus.
  • the recognition result may be a trajectory indicating a change in the three-dimensional position of the hand, or information indicating an operator's motion estimated from the hand trajectory.
  • the object 103 is detected based on the angle of the line segment and the length of the line segment in the three-dimensional space from the line segments approximating the candidate area detected from the distance image 432. Corresponding line segments can be excluded. Then, by identifying the position of the hand within the remaining line segment including the line segment that is the farthest from the end of the distance image 432, even if the object 103 is an unknown object, the right hand The position of 503 can be specified with high accuracy. Therefore, the recognition accuracy of the position of the hand in proximity to the unknown object is improved, and the recognition accuracy of the operator's motion is also improved.
  • FIG. 6 is a flowchart showing a specific example of image processing performed by the image processing apparatus 201 in FIG.
  • the detection unit 211 performs candidate area detection processing (step 601), and then the extraction unit 212 and the specifying unit 213 perform hand position detection processing (step 602).
  • FIG. 7 is a flowchart showing an example of candidate area detection processing in step 601 of FIG.
  • the detection unit 211 subtracts the pixel value of each pixel of the background distance image 431 from the pixel value (distance value) of each pixel of the distance image 432, generates a difference image, and binarizes the generated difference image. (Step 701).
  • FIG. 8 shows an example of the distance image 432 and the background distance image 431.
  • FIG. 8A shows an example of the distance image 432
  • FIG. 8B shows an example of the background distance image 431. Since the pixel values of the distance image 432 and the background distance image 431 represent the distance from the imaging device 401 to the subject, the pixel values are smaller as the subject is closer to the imaging device 401 and larger as the subject is farther from the imaging device 401. In this case, the difference between the background pixel values common to the distance image 432 and the background distance image 431 is close to 0, but the difference between the foreground pixel values closer to the imaging device 401 than the background is a negative value.
  • the detection unit 211 compares the pixel value difference with T1 using a negative predetermined value as the threshold T1, and if the difference is less than T1, sets the pixel value of the difference image to 255 (white), If the difference is equal to or greater than T1, the pixel value of the difference image is set to 0 (black). Thereby, the difference image is binarized.
  • FIG. 9 shows an example of a binarized difference image.
  • FIG. 9A shows an example of a binary image generated in step 701.
  • the difference image is binarized, not only the pixel value of both hands of the operator but also the pixel value of an object close to the hand is set to white.
  • the detection unit 211 performs opening and closing for each pixel of the binary image (step 702). First, by performing the opening, white pixels are reduced and small white areas are removed. Thereafter, by performing closing, the black isolated point generated by the opening can be changed to white. Thereby, the binary image of FIG. 9B is generated from the binary image of FIG.
  • the detection unit 211 selects each white pixel as a target pixel, and calculates a difference between the pixel value in the distance image 432 of the target pixel and the pixel value in the distance image 432 of white pixels adjacent to the target pixel in the upper, lower, left, and right directions. Ask. Then, the detection unit 211 compares the maximum absolute value of the difference with a predetermined threshold T2, and when the maximum value is equal to or greater than T2, changes the target pixel from a white pixel to a black pixel (step 703). Thereby, the binary image of FIG. 9C is generated from the binary image of FIG. 9B.
  • the detection unit 211 performs contour detection processing to detect a plurality of white areas, and selects a white area that satisfies the following condition as a candidate area from the detected white areas (step 704).
  • the white area is in contact with the end of the binary image.
  • the area of the white region is not less than a predetermined value.
  • the operator's arm extends from the lower end (body side) of the image toward the upper end, and thus a white region in contact with the lower end of the binary image is a candidate. Selected as a region.
  • one white region shown in FIG. 9D is selected from the binary image shown in FIG.
  • the detection unit 211 smoothes the outline of the selected white area (step 705).
  • the outline of the white region may be uneven due to the influence of wrinkles on clothes worn by the worker. Therefore, by performing the processing in step 705, such unevenness can be smoothed. For example, closing may be used as the smoothing process.
  • the smoothing process the binary image shown in FIG. 9E is generated from the binary image shown in FIG. 9D.
  • the detection unit 211 detects the white region as the binary image.
  • the image is divided into two white areas that contact the lower end at only one place (step 706).
  • the detection unit 211 generates region information 433 indicating the white region generated by the division.
  • the detection unit 211 obtains a contour part sandwiched between two contour parts in contact with the white region among the contours of the binary image, and is located at the top of the pixels of the obtained contour part.
  • the x coordinate x1 of the pixel (the pixel with the smallest y coordinate) is obtained. Then, the detection unit 211 can divide the white area into two white areas by changing all white pixels whose x coordinate is x1 in the white area to black pixels.
  • the binary image shown in FIG. 9F is generated from the binary image shown in FIG. 9E.
  • the two white regions generated by the division correspond to a region including the left hand and a region including the right hand, respectively.
  • the detection unit 211 compares the x coordinates of the contour portions included in the two white areas, determines the white area having the smaller x coordinate as a candidate area for the left hand, and selects the white area having the larger x coordinate as the white area. Determine the candidate area for the right hand. In this case, the detection unit 211 sets 2 to a variable nHands representing the number of detected candidate regions.
  • step 704 when two white areas touching the lower end of the binary image at only one place are selected, the detection unit 211 sets nHands to 2 without dividing any white area. In addition, when one white region that touches the lower end of the binary image at only one location is selected, the detection unit 211 sets 1 to nHands.
  • the detection unit 211 uses a positive predetermined value as the threshold T1, sets the pixel value of the difference image to 255 (white) when the difference is larger than T1, and sets the pixel value of the difference image to be equal to or less than T1.
  • the pixel value is set to 0 (black).
  • N is an integer of 3 or more
  • the detection unit 211 displays the area.
  • a maximum of N white regions may be selected in order from the larger.
  • FIG. 10 is a flowchart showing an example of the hand position detection process in step 602 of FIG.
  • the extraction unit 212 sets 0 to the control variable i indicating the i-th candidate area among the candidate areas indicated by the area information 433 (step 1001), and compares i with nHands (step 1002). When i is less than nHands (step 1002, YES), the line segment detection unit 421 performs a line segment detection process for the i-th candidate region (step 1003).
  • the determination unit 422 performs parameter calculation processing (step 1004), performs bending determination processing (step 1005), and performs length determination processing (step 1006). Then, the specifying unit 213 performs position specifying processing (step 1007).
  • the extraction unit 212 increments i by 1 (step 1008), and repeats the processing after step 1002.
  • i reaches nHands (step 1002, NO)
  • the extraction unit 212 ends the process.
  • FIG. 11 is a flowchart showing an example of line segment detection processing in step 1003 of FIG.
  • the line segment detection unit 421 thins the i-th candidate region (step 1101).
  • the line segment detection unit 421 can thin the candidate region using a thinning algorithm such as the Tamura method, the Zhang-Suen method, or the NWG method described in Non-Patent Document 1.
  • a branch occurs during thinning, the line segment detection unit 421 generates a single curve composed of an array of consecutive points, leaving only the longest branch of the plurality of branches.
  • the line segment detector 421 approximates a curve with a plurality of connected line segments, and generates line segment information 434 indicating these line segments (step 1102).
  • the line segment detection unit 421 uses the Ramer-Douglas-Peucker algorithm described in Non-Patent Document 2 to convert a curve into a plurality of approximate line segments, and allows a predetermined tolerance for the deviation between the curve and the approximate line segments. It can be within the range of error.
  • FIG. 12 shows an example of line segment information 434 in an array format.
  • the array number is identification information indicating one end point of both ends of each line segment in the binary image, and the x coordinate and the y coordinate are coordinate points (x, y) common to the binary image and the distance image 432. ).
  • the point corresponding to sequence number 0 is located at the lower end of the candidate region, and the corresponding point moves away from the lower end as the sequence number increases.
  • the distance value Z represents a pixel value corresponding to the coordinates (x, y) of the distance image 432.
  • the j-th line segment and the j + 1-th line segment are adjacent to each other and are connected at a point corresponding to the array element number j + 1.
  • the line segment detection unit 421 sets n + 1 to a variable nPts that represents the number of end points.
  • FIG. 13 is a flowchart showing an example of parameter calculation processing in step 1004 of FIG.
  • the determination unit 422 sets 0 to the control variable j representing the array element number of the line segment information 434 (step 1301), and compares j with nPts ⁇ 1 (step 1302). When j is less than nPts ⁇ 1 (step 1302, YES), the determination unit 422 determines the length len in the three-dimensional space of the j-th line segment having both ends corresponding to the array element number j and the array element number j + 1. j is calculated (step 1303).
  • the determination unit 422 can obtain the length len j by a pinhole camera model using the coordinates (x, y) of two points corresponding to the array element number j and the array element number j + 1.
  • the pinhole camera model is used, the length len1 of the line segment in the three-dimensional space corresponding to the line segment between the point (x1, y1) and the point (x2, y2) in the binary image is Calculated by the formula.
  • X1 (Z1 ⁇ (x1 ⁇ cx)) / fx (1)
  • Y1 (Z1 ⁇ (y1-cy)) / fy (2)
  • X2 (Z2 ⁇ (x2-cx)) / fx (3)
  • Y2 (Z2 ⁇ (y2-cy)) / fy (4)
  • len1 ((X1-X2) 2 + (Y1-Y2) 2 + (Z1-Z2) 2 ) 1/2 (5)
  • Z1 and Z2 represent the distance value Z of the point (x1, y1) and the point (x2, y2), respectively, and (cx, cy) represents the coordinates of the principal point in the binary image.
  • the center of the binary image is used as the principal point.
  • fx and fy are focal lengths expressed in units of pixels in the x-axis direction and the y-axis direction, respectively.
  • the origin of the coordinate system representing the coordinates (X, Y, Z) in the three-dimensional space may be the installation position of the imaging device 401.
  • the determination unit 422 compares j with nPts-2 (step 1304). When j is less than nPts ⁇ 2 (step 1304, YES), the determination unit 422 determines that the jth line segment and the j + 1th line segment having both ends corresponding to the array element number j + 1 and the array element number j + 2 are both ends. calculating the angle theta j between (step 1305).
  • the determination unit 422 calculates the angle ⁇ by calculating the inner product described in Non-Patent Document 3 using the coordinates (x, y) of three points corresponding to the array element number j, the array element number j + 1, and the array element number j + 2. j can be obtained.
  • the determination unit 422 increments j by 1 (step 1306), and repeats the processing after step 1302. If j reaches nPts ⁇ 2 (step 1304, NO), the determination unit 422 skips the process of step 1305 and repeats the processes after step 1306. Next, when j reaches nPts ⁇ 1 (step 1302, NO), the determination unit 422 ends the process.
  • FIG. 14 is a flowchart illustrating an example of the bending determination process in step 1005 of FIG.
  • a line segment estimated to exist before the hand is excluded based on the determination result for the angle ⁇ j of the line segment.
  • the maximum number of folds between the lower end and the hand of the candidate area where the arm and hand are shown is two, and the third and subsequent folds are bent by fingers or bent by an object other than the hand. It is estimated that. Even when the number of bends is two or less, if the angle ⁇ j is smaller than the movable angle of the elbow or wrist, it is estimated that the bend is caused by a finger or an object.
  • the determination unit 422 sets the control variable j to 0, sets the variable Nbend representing the number of bends to 0 (step 1401), and compares j with nPts-2 (step 1402). When j is less than nPts ⁇ 2 (step 1402, YES), the determination unit 422 compares ⁇ j and ⁇ min (step 1403).
  • ⁇ min is non-patent document 4.
  • ⁇ max is a threshold value for determining that there is no bending, and is set to a value larger than ⁇ min and smaller than 180 °.
  • ⁇ max may be an angle in the range of 150 ° to 170 °.
  • the determination unit 422 increments Nbend by 1 (step 1405) and compares Nbend with 2 (step 1406). When Nbend is 2 or less (step 1406, NO), the determination unit 422 increments j by 1 (step 1408), and repeats the processing after step 1402.
  • the determination unit 422 deletes the points after the array element number j + 2 from the line segment information 434, and changes nPts from n + 1 to j + 2 (step 1407). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434.
  • step 1403, NO the determination unit 422 performs the process of step 1407. Thereby, when ⁇ j is smaller than the movable angle of the elbow or wrist, the line segment corresponding to the finger or the object is deleted from the line segment information 434.
  • the determination unit 422 determines that the point of the array element number j + 1 does not correspond to bending, skips the processing of step 1405 and step 1406, and after step 1408 Repeat the process.
  • j reaches nPts ⁇ 2 (step 1402, NO)
  • the determination unit 422 ends the process.
  • the number of line segments indicated by the line segment information 434 can be reduced, and line segment candidates that approximate the hand area can be narrowed down.
  • FIG. 15 shows an example of four line segments to be subjected to bending determination processing.
  • p This represents the angle between line segments calculated from the three points ts [j + 1] and pts [j + 2].
  • a line segment with both ends of pts [0] and pts [1] corresponds to the upper arm
  • a line segment with both ends of pts [1] and pts [2] corresponds to the forearm
  • pts [1] corresponds to the elbow.
  • line segments having both ends of pts [2] and pts [3] correspond to the hand
  • pts [2] corresponds to the wrist.
  • a line segment having both ends of pts [3] and pts [4] corresponds to an object held by the hand
  • pts [3] corresponds to a finger joint.
  • FIG. 16 is a flowchart illustrating an example of the length determination process in step 1006 of FIG.
  • a line segment that is presumed to exist before the hand is excluded based on the determination result with respect to the angle ⁇ j of the line segment and the length len j of the line segment.
  • the determination unit 422 sets the control variable j to 0, sets the variable len representing the length in the three-dimensional space to 0, and sets the variable ctr representing the number of line segments to 0 (step 1601).
  • J and nPts ⁇ 1 are compared (step 1602).
  • the determination unit 422 adds len j to len (step 1603) and compares ⁇ j with ⁇ max (step 1604).
  • ⁇ j is less than ⁇ max (step 1604, NO)
  • the determination unit 422 compares len and len max ( Step 1605).
  • len max is an upper limit value of the length of the forearm, and can be determined based on the length of the forearm described in Non-Patent Document 4, for example. Further, when the height of the worker is known, len max may be determined from the height based on the Vitruvian human figure of Leonardo da Vinci.
  • the determination unit 422 deletes the points after the array element number j + 2 from the line segment information 434 and changes nPts to j + 2 (step 1606). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434. Then, the determination unit 422 performs normal line determination processing in steps 1613 to 1620.
  • the determination unit 422 determines that the point of the array element number j + 1 does not correspond to bending, increments j by 1 (step 1611), and after step 1602 Repeat the process. If j reaches nPts ⁇ 1 (step 1602, NO), the determination unit 422 performs normal line determination processing in steps 1613 to 1620.
  • the determination unit 422 determines that l en and len min are compared (step 1607). When len is equal to or greater than len min (step 1607, NO), the determination unit 422 sets len to 0 (step 1612), and performs the processing after step 1611.
  • step 1607, YES When len is less than len min (step 1607, YES), the determination unit 422 ctr is incremented by 1 (step 1608), and ctr is compared with 2 (step 1609). When ctr is 2 or less (step 1609, NO), the determination unit 422 performs the processing after step 1612.
  • the determination unit 422 deletes the points after the array element number j + 1 from the line segment information 434 and changes nPts to j + 1 (step 1610). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434. Then, the determination unit 422 performs normal line determination processing in steps 1613 to 1620.
  • FIG. 17 shows an example of four line segments to be subjected to length determination processing.
  • len 0 corresponds to the length of a part of the upper arm
  • len 1 corresponds to the length of the forearm
  • len 2 corresponds to the length of the hand
  • len 3 corresponds to the length of the object held by the hand. To do.
  • FIG. 18 shows an example of three line segments to be subjected to length determination processing.
  • len 0 corresponds to the length of a part of the forearm
  • len 1 corresponds to the length of the hand
  • len 2 corresponds to the length of the object held by the hand.
  • each of the plurality of points on the line segment having the furthest distance to the lower end of the candidate area is passed.
  • Two intersection points of the straight line and the outline of the candidate area are obtained.
  • a portion approximating the hand region is extracted from the line segment.
  • the line segment with the longest distance to the lower end of the candidate area is a line segment having both ends of pts [nPts-2] and pts [nPts-1], and is estimated to be a line segment corresponding to the hand.
  • the determination unit 422 divides a line segment L having both ends of pts [nPts-2] and pts [nPts-1] at an interval of a predetermined number of pixels, so that m pieces (m is 2 on the line segment L). (Integer) is obtained, and a normal line perpendicular to the line segment L is obtained at each point (step 1613). Then, the determination unit 422 calculates an intersection between each of the m normals and the contour of the candidate area. Since the line segment L exists in the candidate area and the contour of the candidate area exists on both sides of the line segment L, two intersections between each normal line and the contour are obtained.
  • the determination unit 422 sets 0 to a control variable k indicating one normal line (step 1614), and compares k and m (step 1615).
  • n_len k is an upper limit value of the hand width, and may be determined based on the hand width described in Non-Patent Document 4, for example, or may be determined based on a Vitruvian human figure.
  • n_len k is equal to or smaller than n_len max (step 1617, NO)
  • the determination unit 422 compares n_len k and n_len min (step 1618).
  • n_len min Is a lower limit value of the hand width, and may be determined based on the hand width described in Non-Patent Document 4, for example, or may be determined based on a Vitruvian human figure.
  • n_len k is greater than or equal to n_len min (step 1618, NO)
  • the determination unit 422 increments k by 1 (step 1620), and repeats the processing after step 1615.
  • k reaches m (step 1615, NO) the determination unit 422 ends the process.
  • the determination unit 422 obtains an intersection between the k-th normal and the line segment L, and records the obtained intersection as pts [nPts ⁇ 1] ′. (Step 1619). As a result, a portion from the point pts [nPts-2] to the point pts [nPts-1] ′ on the line segment L is extracted as a portion approximating the hand region. Then, the determination unit 422 generates hand region line segment information 435 including the x coordinate, the y coordinate, and the distance value Z of pts [nPts-2] and pts [nPts-1] ′.
  • N_len k is less than n_len min (step 1618, YES), the determination unit 422 performs the process of step 1619.
  • FIG. 19 shows an example of a line segment to be subjected to normal line determination processing.
  • a line segment corresponding to an object that has not been deleted by the bending determination process can be deleted, and a line segment approximating the hand region is specified. And the part which approximates a hand area on the specified line segment can be specified.
  • FIG. 20 is a flowchart showing an example of the position specifying process in step 1007 of FIG.
  • the center position of the palm is specified by obtaining the maximum inscribed circle inscribed in the outline of the candidate area in the hand area corresponding to the line segment indicated by the hand area line segment information 435.
  • the inscribed circle at the center of the palm is considered to be larger than the inscribed circle in the closed finger area. Further, since the width of the palm is wider than the width of the wrist, the inscribed circle at the center of the palm is considered to be larger than the inscribed circle in the region of the wrist. Therefore, the center of the maximum inscribed circle in the hand region can be regarded as the center of the palm.
  • the specifying unit 213 sets a scanning range between the points pts [nPts-2] and the points pts [nPts-1] ′, and is inscribed in the outline of the candidate area around each scanning point in the scanning range.
  • a tangent circle is obtained (step 2001).
  • the specifying unit 213 obtains the coordinates (xp, yp) of the center of the largest inscribed circle among the inscribed circles centered on each of the plurality of scanning points.
  • the scanning range may be a line segment having points pts [nPts-2] and pts [nPts-1] ′ as both ends, and is an area in which the line segment is expanded by a predetermined number of pixels in the x direction and the y direction. There may be.
  • the specifying unit 213 obtains the minimum value d of the distance from each scanning point to the contour of the candidate area, and obtains the coordinates (xmax, ymax) of the scanning point at which the minimum value d is maximum. Then, the specifying unit 213 records the coordinates (xmax, ymax) as the coordinates (xp, yp) of the center of the maximum inscribed circle. In this case, the minimum distance d at the scanning point (xmax, ymax) is the radius of the maximum inscribed circle.
  • FIG. 21 shows an example of the maximum inscribed circle.
  • a line segment 2101 having both ends of the points pts [nPts-2] and pts [nPts-1] ′ is set as the scanning range, and the maximum inscribed circle 2103 centering on the point 2102 on the line segment 2101 is set. Is required.
  • the specifying unit 213 obtains a point in the three-dimensional space corresponding to the center of the maximum inscribed circle, and generates position information 436 indicating the position of the obtained point (step 2002). For example, the specifying unit 213 obtains the coordinates (Xp, Yp, Zp) of the point in the three-dimensional space using the pinhole camera model, using the coordinates (xp, yp) of the center of the maximum inscribed circle and the distance value Zp. be able to. In this case, Xp and Yp are calculated by the following equations.
  • FIG. 22 shows a second specific example of the image processing apparatus 201 of FIG.
  • the image processing apparatus 201 in FIG. 22 has a configuration in which a detection unit 2201 and an upper limb detection unit 2202 are added to the image processing apparatus 201 in FIG. 4, and obtains the position of the hand and the upper limb from the distance image 432.
  • FIG. 23 shows an example of the visual field range of the imaging device 401 of FIG.
  • the imaging apparatus 401 in FIG. 23 is installed above the workbench 102, and the visual field range 2301 is gripped by the workbench 102, the upper body 2302 including the left hand 502 and the right hand 503 of the worker 101, and the right hand 503.
  • the object 2303 is included. Therefore, the work table 102, the upper body 2302, and the object 2303 are shown in the distance image 432 captured by the imaging device 401.
  • the XW axis, YW axis, and ZW axis represent the world coordinate system, and the origin O is provided on the floor of the work place.
  • the XW axis is provided in parallel with the long side of the work table 102 and represents the direction from the left shoulder to the right shoulder of the worker 101.
  • the YW axis is provided in parallel with the short side of the work table 102 and represents the direction from the front to the back of the worker 101.
  • the ZW axis is provided perpendicular to the floor surface and represents a direction from the floor surface toward the top of the worker 101.
  • the detection unit 2201 detects a body region from the distance image 432 using the background distance image 431 and stores region information 2211 indicating the detected body region in the storage unit 412.
  • the body region is a region where the upper body 2302 of the worker 101 is estimated to be captured.
  • the upper limb detection unit 2202 identifies the position of the upper limb including the wrist, elbow, or shoulder using the distance image 432, the region information 2211, and the center position of the palm, and the position information 2212 indicating the identified position is stored in the storage unit 412. To store.
  • the output unit 411 outputs a recognition result based on the position information 436 and the position information 2212.
  • the recognition result may be a trajectory indicating a change in the three-dimensional position of the hand and the upper limb, or may be information indicating an operator's motion estimated from the trajectory of the hand and the upper limb.
  • the image processing apparatus 201 in FIG. 22 it is possible to recognize the movement of the worker in consideration of not only the movement of the hand but also the movement of the upper limb, and the recognition accuracy of the movement of the worker is further improved.
  • FIG. 24 is a flowchart illustrating a specific example of image processing performed by the image processing apparatus 201 in FIG.
  • the distance image 432 is divided into two areas, a hand detection area and a body detection area.
  • FIG. 25 shows an example of a distance image 432 and a background distance image 431 obtained by photographing the visual field range 2301 of FIG.
  • FIG. 25A shows an example of the distance image 432
  • FIG. 25B shows an example of the background distance image 431.
  • the distance image 432 is divided into a hand detection area 2501 and a body detection area 2502, and the background distance image 431 is similarly divided into two areas.
  • the detection unit 2201 performs body region detection processing using the body detection region of the distance image 432 (step 2401), and the detection unit 211 performs candidate region detection processing using the hand detection region of the distance image 432 (step 2401). Step 2402).
  • the extraction unit 212 and the specifying unit 213 perform hand position detection processing (step 2403), and the upper limb detection unit 2202 performs upper limb position detection processing (step 2404).
  • FIG. 26 is a flowchart showing an example of body region detection processing in step 2401 of FIG.
  • the detection unit 2201 subtracts the pixel value of each pixel in the body detection area of the background distance image 431 from the pixel value of each pixel in the body detection area of the distance image 432, generates a difference image, and generates the generated difference.
  • the image is binarized (step 2601).
  • the detection unit 2201 performs opening and closing for each pixel of the binary image (step 2602). Then, the detection unit 2201 extracts, as a body region, a white region having the maximum area among the white regions included in the binary image (step 2603), and obtains the center of gravity of the body region (step 2604). For example, the detection unit 2201 can obtain the coordinates of the center of gravity of the body region by calculating the average value of the x and y coordinates of each of the plurality of pixels included in the body region.
  • the detection unit 2201 obtains the position of the head shown in the body region (step 2505). For example, the detection unit 2201 generates a histogram of distance values for each of a plurality of pixels included in the body region, and a threshold value such that the number of pixels having a distance value equal to or less than the threshold value THD is greater than or equal to a predetermined number from the generated histogram. Determine the THD. Next, the detection unit 2201 selects the maximum region as a head region among regions composed of pixels having a distance value equal to or greater than the threshold value THD, and obtains the coordinates of the center of gravity of the head region as the head position. Then, the detection unit 2201 generates region information 2211 indicating the coordinates of the center of gravity of the body region and the head region.
  • the detection unit 211 performs a candidate area detection process similar to that of FIG. 7 using the hand detection areas of the distance image 432 and the background distance image 431.
  • FIG. 27 is a flowchart showing an example of the hand position detection process in step 2403 of FIG.
  • the processing in Step 2701 to Step 2706, Step 2708, and Step 2709 in FIG. 27 is the same as the processing in Step 1001 to Step 1008 in FIG.
  • the determination unit 422 performs a three-dimensional direction determination process after performing the length determination process (step 2707).
  • FIG. 28 is a flowchart showing an example of the three-dimensional direction determination process in step 2707 of FIG. In the three-dimensional direction determination process, based on the result of comparing the direction in the three-dimensional space of each line segment and the direction in the three-dimensional space of the subject reflected in the area including the line segment, Presumed line segments are excluded.
  • the determination unit 422 sets 0 to the control variable j (step 2801), and compares j with nPts ⁇ 1 (step 2802). When j is less than nPts ⁇ 1 (step 2802, YES), the determination unit 422 determines the coordinates (X j , Y j , Z j ) and coordinates (X j + 1 , Y j + 1 , Z j + 1 ) in the three-dimensional space of pts [j + 1] are obtained (step 2803). For example, the determination unit 422 can obtain (X j , Y j , Z j ) and (X j + 1 , Y j + 1 , Z j + 1 ) using a pinhole camera model.
  • the determination unit 422 uses (X j , Y j , Z j ) and (X j + 1 , Y j + 1 , Z j + 1 ) to convert pts [j] and pts [j + 1] to both ends.
  • a direction vector V j indicating the direction of the line segment L j in the three-dimensional space is obtained (step 2804).
  • the determination unit 422 can determine a vector from (X j , Y j , Z j ) to (X j + 1 , Y j + 1 , Z j + 1 ) in the three-dimensional space as V j. .
  • the determination unit 422 may obtain the coordinates in the three-dimensional space of each of a plurality of points on the line segment L j and obtain a vector that approximates a curve connecting these coordinates as V j .
  • the determination unit 422 sets a peripheral region of the line segment L j in the candidate region, obtains coordinates in a three-dimensional space of each of a plurality of points in the peripheral region, and performs principal component analysis on these coordinates ( Step 2805). Then, the determination unit 422 obtains the first principal component vector EV j from the result of the principal component analysis (step 2806). EV j indicates the direction in the three-dimensional space of the subject shown in the area including the line segment L j .
  • the line segment L 3 is pts. [3] and pts [4] are both line segments.
  • the determination unit 422 obtains a normal line passing through pts [3] and a normal line passing through pts [4], and an area 2901 surrounded by the outlines of the two normal lines and the candidate area is obtained as a line segment L It can be set as 3 peripheral areas.
  • the determination unit 422 obtains an angle ⁇ j between V j and EV j (step 2807), and compares ⁇ j with a threshold ⁇ th (step 2808). When ⁇ j is less than ⁇ th (step 28) (08, NO), the determination unit 422 increments j by 1 (step 2810), and repeats the processing after step 2802. If j reaches nPts ⁇ 1 (step 2802, NO), the determination unit 422 ends the process.
  • the determination unit 422 deletes the points after the array element number j + 1 from the line segment information 434 and changes nPts to j + 1 (step 2809). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434.
  • FIG. 30 shows an example of the first principal component vectors EV 0 to EV 3 .
  • EV 0 to EV 2 are obtained by principal component analysis on the peripheral regions of the line segments L 0 to L 2 , and correspond to the upper arm, forearm, and hand regions, respectively.
  • the directions of EV 0 to EV 1 are close to the directions of the direction vectors V 0 to V 2 of L 0 to L 2 .
  • the direction of EV 3 corresponding to the region of the object 3001 hand is holding is different significantly from the direction of the direction vector V 3 of the line segment L 3, the angle gamma 3 between V 3 and EV 3 It becomes larger than ⁇ th . Therefore, pts [4] is deleted from the line segment information 434, and nPts is changed from 5 to 4. Thereby, the line segment corresponding to the object 3001 is deleted.
  • FIG. 31 is a flowchart showing an example of the upper limb position detection process in step 2404 of FIG.
  • the upper limb detection unit 2202 obtains the position of the center of gravity of the head region indicated by the region information 2211 in the world coordinate system (step 3101).
  • the upper limb detection unit 2202 can obtain coordinates in a three-dimensional space using a pinhole camera model using the coordinates of the center of gravity of the head region and the distance value.
  • the upper limb detection unit 2202 converts the obtained coordinates into coordinates (XWH, YWH, ZWH) in the world coordinate system illustrated in FIG. 23 using an RT matrix based on the external parameters of the imaging device 401.
  • the external parameters of the imaging device 401 include the height from the floor surface to the installation position of the imaging device 401 and the tilt angle of the imaging device 401, and the RT matrix includes the three-dimensional coordinate system and the world coordinates with the imaging device 401 as the origin. Represents rotation and translation between systems.
  • the obtained ZWH represents the height from the floor surface to the top of the head, and corresponds to the approximate height of the operator.
  • the upper limb detection unit 2202 obtains coordinates in the three-dimensional space of the center of gravity of the body area indicated by the area information 2211 and converts the obtained coordinates into coordinates (XWB, YWB, ZWB) in the world coordinate system using the RT matrix. (Step 3102).
  • the upper limb detection unit 2202 obtains the approximate position of both shoulders in the world coordinate system using ZWH as the height (step 3103). For example, the upper limb detection unit 2202 determines the ratio of the height of both shoulders to the height and the ratio of the shoulder width to the height based on the Vitruvian human figure, and the height of the left shoulder (ZW coordinates) ZWLS, right The shoulder height ZWRS and the shoulder width SW can be obtained.
  • the upper limb detection unit 2202 obtains XWB ⁇ SW / 2 as the lateral position (XW coordinate) XWLS of the left shoulder, obtains XWB + SW / 2 as the lateral position XWRS of the right shoulder, and determines YWB as the YW coordinates of the left shoulder and the right shoulder. To do.
  • the upper limb detection unit 2202 may use XWH and YWH instead of XWB and YWB.
  • the upper limb detection unit 2202 sets 0 to the control variable i indicating the i-th candidate region (step 3104), and compares i with nHands (step 3105). When i is less than nHands (step 3105, YES), the upper limb detection unit 2202 obtains the wrist position based on the palm center position obtained by the specifying unit 213 in the i-th candidate region (step 3106). . Then, the upper limb detection unit 2202 converts the coordinates of the wrist position into the coordinates of the world coordinate system using the RT matrix.
  • the upper limb detection unit 2202 is present on the side closer to the lower end (body side) of the candidate region than the center position of the palm among pts [0] to pts [nPts-1] indicated by the line segment information 434, and The point closest to the center position can be determined as the wrist position.
  • the upper limb detection unit 2202 may determine a point on the line segment that exists at a position away from the center position of the palm by a predetermined distance as the position of the wrist.
  • FIG. 32 shows an example of the position of the wrist.
  • pts [0] to pts [2] existing closer to the lower end of the candidate region than the palm center position 3201 pts [2] closest to the palm center position 3201 is the wrist position. To be determined.
  • the upper limb detection unit 2202 obtains the elbow position based on the wrist position in the i-th candidate region, and converts the elbow position coordinates into coordinates in the world coordinate system using the RT matrix (step 3107). ).
  • the upper limb detection unit 2202 determines the ratio of the forearm length to the height based on the Vitruvian human figure, obtains the forearm length from the ZWH, and calculates the length of the forearm on the image in the candidate region. Can be converted.
  • the upper limb detection unit 2202 obtains a point separated from the wrist position by the length of the forearm on the image toward the side closer to the lower end of the candidate region, and among the pts [0] to pts [nPts ⁇ 1] Then, a point existing within a predetermined error range from the obtained point is determined as the elbow position. For example, in the example of FIG. 32, pts [1] is determined as the elbow position.
  • the upper limb detection unit 2202 When the point indicated by the line segment information 434 does not exist within the predetermined error range, the upper limb detection unit 2202 includes the circle centered on the wrist position and the forearm length on the image as a radius, and the line segment information 434 An intersection with the line segment to be shown may be obtained, and the obtained intersection may be determined as the elbow position. When the elbow is not shown in the hand detection area of the distance image 432 and there is no intersection between the circle and the line segment, the upper limb detection unit 2202 is on the extension line of the line segment connecting the wrist position and pts [0]. A point separated from the wrist position by the length of the forearm on the image is determined as the elbow position.
  • the upper limb detection unit 2202 corrects the position of the shoulder based on the position of the elbow in the world coordinate system (step 3108). For example, the upper limb detection unit 2202 can determine the ratio of the length of the upper arm to the height based on the Vitruvian human figure, and can determine the length of the upper arm from the ZWH. Then, the upper limb detection unit 2202 obtains a three-dimensional distance UALen between the elbow coordinates and the shoulder coordinates in the world coordinate system.
  • the upper limb detection unit 2202 moves the position of the shoulder on the three-dimensional straight line connecting the elbow coordinates and the shoulder coordinates so that UAlen matches the length of the upper arm. .
  • the upper limb detection unit 2202 does not correct the coordinates of the shoulder. Then, the upper limb detection unit 2202 generates position information 2212 indicating the positions of the wrist, elbow, and shoulder in the world coordinate system.
  • the upper limb detection unit 2202 increments i by 1 (step 3109), and repeats the processing after step 3105.
  • i reaches nHands (step 3105, NO)
  • the upper limb detection unit 2202 ends the process.
  • the upper limb detection unit 2202 may use a predetermined value set in advance as the height instead of ZWH.
  • FIG. 33 shows a functional configuration example of an image processing system including the image processing apparatus 201 of FIG. 4 or FIG.
  • the image processing system in FIG. 33 includes an image processing device 201 and an image processing device 3301.
  • a transmission unit 3311 in the image processing apparatus 201 corresponds to the output unit 411 in FIG. 4 or FIG. 22, and transmits a recognition result based on the position information 436 or a recognition result based on the position information 436 and the position information 2212 via a communication network.
  • the image processing apparatus 201 generates a plurality of recognition results in each of a plurality of time zones, and transmits the recognition results to the image processing apparatus 3301 in time series.
  • the image processing device 3301 includes a reception unit 3321, a display unit 3322, and a storage unit 3323.
  • the receiving unit 3321 receives a plurality of recognition results in time series from the image processing apparatus 201, and the storage unit 3323 stores the received plurality of recognition results in association with each of a plurality of time zones.
  • the display unit 3322 displays the time-series recognition results stored in the storage unit 3323 on the screen.
  • the image processing apparatus 201 may be a server installed at a work site such as a factory, or may be a server on the cloud that communicates with the imaging apparatus 401 via a communication network.
  • the image processing device 3301 may be a server or a terminal device of an administrator who monitors the operation of the worker.
  • the functions of the image processing apparatus 201 in FIG. 4 or 22 can be distributed and implemented in a plurality of apparatuses connected via a communication network.
  • the detection unit 211, the extraction unit 212, the specifying unit 213, the detection unit 2201, and the upper limb detection unit 2202 may be provided in different devices.
  • the configuration of the image processing apparatus 201 in FIGS. 2, 4, and 22 is merely an example, and some components may be omitted or changed according to the use or conditions of the image processing apparatus 201.
  • the configuration of the image processing system in FIG. 33 is merely an example, and some components may be omitted or changed according to the use or conditions of the image processing system.
  • Step 702 when the process is simplified, the processes of Step 702, Step 703, and Step 705 can be omitted.
  • the detection unit 211 may select a white region that is in contact with the upper end, the left end, or the right end of the binary image as a candidate region.
  • either the bending determination process in step 1005 or the length determination process in step 1006 may be omitted.
  • the specifying unit 213 may determine the midpoint of the line segment indicated by the hand region line segment information 435 as the center position of the palm instead of the center of the maximum inscribed circle.
  • the process of step 2602 can be omitted.
  • the hand position detection process of FIG. 27 either the bending determination process in step 2705 or the length determination process in step 2706 may be omitted.
  • the determination unit 422 may determine a vector indicating the direction in the three-dimensional space of the subject shown in the region including the line segment L j by a method other than the principal component analysis.
  • the upper limb detection unit 2202 does not need to obtain all positions of the shoulder, wrist, and elbow.
  • the upper limb detection unit 2202 may generate position information 2212 indicating the position of any one of a shoulder, a wrist, and an elbow.
  • the installation position of the three-dimensional distance sensor in FIG. 1 and the installation position of the imaging device in FIGS. 5 and 23 are merely examples, and the three-dimensional distance sensor or the imaging device is installed at a position where the operator can be photographed from another angle. May be.
  • the distance image and the background distance image change according to the subject existing in the visual field range of the imaging apparatus.
  • the hand detection area and the body detection area in FIG. 25 are merely examples, and a hand detection area and a body detection area having different positions or different shapes may be used.
  • the line segment information in FIG. 12 and the line segments in FIGS. 15, 17 to 19, and 32 are merely examples, and the line segment information and the line segments change according to the captured distance image.
  • the maximum inscribed circle in FIG. 21, the peripheral region in FIG. 29, and the vector of the first principal component in FIG. 30 are only examples, and the vector of the maximum inscribed circle, the peripheral region, and the first principal component were taken. It changes according to the distance image.
  • Other shaped peripheral regions may be used.
  • FIG. 34 shows a configuration example of an information processing apparatus (computer) used as the image processing apparatus 201 in FIGS. 2, 4, and 22.
  • 34 includes a central processing unit (CPU) 3401, a memory 3402, an input device 3403, an output device 3404, An auxiliary storage device 3405, a medium driving device 3406, and a network connection device 3407 are provided. These components are connected to each other by a bus 3408.
  • the imaging device 401 may be connected to the bus 3408.
  • the memory 3402 is a semiconductor memory such as a Read Only Memory (ROM), a Random Access Memory (RAM), and a flash memory, and stores programs and data used for processing.
  • ROM Read Only Memory
  • RAM Random Access Memory
  • flash memory stores programs and data used for processing.
  • the memory 3402 can be used as the storage unit 412.
  • the CPU 3401 executes a program using the memory 3402 to detect the detection unit 211, the extraction unit 212, the specifying unit 213, the line segment detection unit 421, the determination unit 422, the detection unit 2201, and the upper limb detection.
  • the unit 2202 operates.
  • the input device 3403 is, for example, a keyboard, a pointing device, etc., and is used for inputting an instruction or information from an operator or a user.
  • the output device 3404 is, for example, a display device, a printer, a speaker, or the like, and is used to output an inquiry or processing result to the operator or the user.
  • the processing result may be a recognition result based on the position information 436 or a recognition result based on the position information 436 and the position information 2212.
  • the output device 3404 can be used as the output unit 411.
  • the auxiliary storage device 3405 is, for example, a magnetic disk device, an optical disk device, a magneto-optical disk device, a tape device, or the like.
  • the auxiliary storage device 3405 may be a flash memory or a hard disk drive.
  • the information processing apparatus can store a program and data in the auxiliary storage device 3405 and load them into the memory 3402 for use.
  • the auxiliary storage device 3405 can be used as the storage unit 412.
  • the medium driving device 3406 drives a portable recording medium 3409 and accesses the recorded contents.
  • the portable recording medium 3409 is a memory device, a flexible disk, an optical disk, a magneto-optical disk, or the like.
  • the portable recording medium 3409 includes Compact Disk Read Only Memory (CD-ROM), Digital Versatile Disk (DVD), Universal Serial Bus (US). B) It may be a memory or the like. An operator or user can store programs and data in the portable recording medium 3409 and load them into the memory 3402 for use.
  • the computer-readable recording medium for storing the program and data used for the processing is the memory 3402, the auxiliary storage device 3405, or the portable recording medium 340. 9 is a physical (non-temporary) recording medium.
  • the network connection device 3407 is a communication interface that is connected to a communication network such as a local area network or a wide area network, and performs data conversion accompanying communication.
  • the information processing apparatus can receive a program and data from an external apparatus via the network connection apparatus 3407, and can use them by loading them into the memory 3402.
  • the network connection device 3407 can be used as the output unit 411 or the transmission unit 3311.
  • the information processing apparatus does not have to include all the components shown in FIG. 34, and some components may be omitted depending on the application or conditions. For example, if it is not necessary to input an instruction or information from an operator or user, the input device 3403 may be omitted. When the portable recording medium 3409 or the communication network is not used, the medium driving device 3406 or the network connection device 3407 may be omitted.
  • the information processing apparatus in FIG. 34 can also be used as the image processing apparatus 3301 in FIG.
  • the network connection device 3407 is used as the reception unit 3321
  • the memory 3402 or the auxiliary storage device 3405 is used as the storage unit 3323
  • the output device 3404 is used as the display unit 3322.
  • a detection unit that detects a candidate area in which a hand is captured from a distance image;
  • An extraction unit that extracts a line segment that approximates a hand region from the plurality of line segments based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region; Using a line segment that approximates the hand region, a specifying unit that specifies the position of the hand in the candidate region;
  • An image processing apparatus comprising: (Appendix 2) The detection unit detects a region in contact with the end of the distance image as the candidate region, and the extraction unit determines that the distance to the end is the two lines when the angle is smaller than a first threshold.
  • a line segment that is farther than the minute is excluded from line segment candidates that approximate the hand region, and a line segment that is the farthest to the end of the remaining line segments is a line segment that approximates the hand region.
  • An angle range between the second line segment and the third line segment is included in the angle range, and an angle between the third line segment and the fourth line segment is the angle range.
  • the image processing apparatus according to appendix 2, wherein the fourth line segment is excluded from line segment candidates that approximate the hand region. (Appendix 4)
  • the extraction unit corresponds to a line segment having a shorter distance to the end portion among the two line segments. Using the length of the line segment, it is determined whether to exclude the line segment that is farther from the end of the two line segments from the line segment candidates that approximate the hand region.
  • the image processing apparatus wherein: (Appendix 5) The extraction unit obtains two intersections between a straight line passing through each of a plurality of points on a line segment having the longest distance to the end and the contour of the candidate area, and corresponds to the distance between the two intersections. Then, based on the distance in the three-dimensional space, the part that approximates the hand region is extracted from the line segment that is the farthest to the end, and the specifying unit uses the part that approximates the hand region.
  • the image processing apparatus according to appendix 3 or 4, wherein the position of the hand is specified.
  • the extraction unit is configured to detect the three-dimensional object captured in a region including a direction in the three-dimensional space of a line segment that is farther to the end and a line segment that is farther to the end. If the angle between the two directions is greater than the third threshold value, the line segment that is farther to the end is excluded from the line segment candidates that approximate the hand region.
  • the image processing apparatus according to any one of appendices 3 to 5, wherein (Appendix 7)
  • the extraction unit obtains a curve by thinning the candidate area, and obtains a plurality of line segments approximating the curve as the plurality of line segments approximating the candidate area.
  • the image processing apparatus according to any one of claims 6 to 6.
  • Appendix 8 The image processing apparatus according to any one of appendices 1 to 7, further comprising an upper limb detection unit that obtains a position of a wrist, an elbow, or a shoulder using the distance image and the position of the hand.
  • Appendix 9 9. The image processing device according to any one of appendices 2 to 8, wherein the first threshold value represents a lower limit value of a bending angle of a wrist or elbow joint.
  • a detection unit that detects a candidate area in which a hand is captured from a distance image;
  • An extraction unit that extracts a line segment that approximates a hand region from the plurality of line segments based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region;
  • a specifying unit that specifies the position of the hand in the candidate region;
  • a display unit for displaying position information indicating the position of the hand;
  • An image processing system comprising: (Appendix 11) From the distance image, detect the candidate area where the hand is reflected, Based on the angle between two adjacent line segments out of a plurality of line segments approximating the candidate area, a line segment approximating a hand area is extracted from the plurality of line segments, Identifying the position of the hand in the candidate region using a line segment approximating the hand region;
  • An image processing program for causing a computer to execute processing.
  • the computer detects a region that is in contact with an end of the distance image as the candidate region, and when the angle is smaller than a first threshold, a line segment whose distance to the end is farther than the two line segments. Is extracted from the line segment candidates that approximate the hand region, and the line segment that is the farthest to the end of the remaining line segments is extracted as a line segment that approximates the hand region.
  • the image processing program according to appendix 11.

Abstract

[Problem] To identify with good precision the position of a hand gripping an object from a distance image of the hand. [Solution] A detection unit 211 detects from a distance image a candidate region in which a hand is captured. On the basis of angles between two adjacent line segments among a plurality of line segments that approximate the candidate region, an extraction unit 212 extracts line segments which approximate a hand region from among the plurality of line segments that approximate the candidate region. Using the line segments that approximate the hand region, an identification unit 213 identifies the position of the hand in the candidate region.

Description

画像処理装置、画像処理システム、画像処理プログラム、及び画像処理方法Image processing apparatus, image processing system, image processing program, and image processing method
 本発明は、画像処理装置、画像処理システム、画像処理プログラム、及び画像処理方法に関する。 The present invention relates to an image processing apparatus, an image processing system, an image processing program, and an image processing method.
 近年、3次元距離センサの普及により、距離画像からユーザの骨格、関節位置等を検出し、検出結果をもとにジェスチャ等の動作を認識する技術が多く開発されている(例えば、特許文献1及び特許文献2を参照)。距離センサ及び距離画像は、それぞれ、深度センサ及び深度画像と呼ばれることもある。 In recent years, with the widespread use of three-dimensional distance sensors, many techniques have been developed that detect the user's skeleton, joint positions, and the like from distance images and recognize gestures and other actions based on the detection results (for example, Patent Document 1). And Patent Document 2). The distance sensor and the distance image may be referred to as a depth sensor and a depth image, respectively.
 2値化画像のラインを細線化する方法、連続する点を複数の近似線分へ変換する方法、3点からなる角度を求める方法、人体のサイズの規定値等も知られている(例えば、非特許文献1~非特許文献4を参照)。 A method of thinning a line of a binarized image, a method of converting continuous points into a plurality of approximate line segments, a method of obtaining an angle composed of three points, a prescribed value of the size of the human body, etc. are also known (for example, Non-Patent Document 1 to Non-Patent Document 4).
米国特許出願公開第2011/0085705号明細書US Patent Application Publication No. 2011/0085705 特開2015-94828号公報Japanese Patent Laying-Open No. 2015-94828
 環境設置型のカメラ又は3次元距離センサを用いて、作業現場における作業者の手の3次元位置の変化を検出することで、作業者の動作を認識できれば、作業ミスの自動検知、正しい作業動作の教育等を実現することができる。これにより、作業効率及び作業品質を向上させることが可能になる。 Automatic detection of work mistakes and correct work movement if the movement of the worker's hand at the work site can be detected using an environment-installed camera or a three-dimensional distance sensor to recognize the worker's movement. Education can be realized. Thereby, work efficiency and work quality can be improved.
 しかし、工場のベルトコンベアラインにおける組立作業のように、作業者の周囲に作業対象物、道具等の物体が存在し、作業者が物体を把持して作業を行う環境では、コンピュータが画像に写っている作業者と物体とを区別することは困難である。このため、手の3次元位置から作業者の動作を認識することが難しくなる。 However, in an environment where there are work objects, tools, and other objects around the worker, such as assembly work on a factory belt conveyor line, the computer is shown in the image in an environment where the worker grips the object and works. It is difficult to distinguish an operator from an object. For this reason, it becomes difficult to recognize the movement of the operator from the three-dimensional position of the hand.
 なお、かかる問題は、工場において作業者が作業対象物又は道具を把持している場合に限らず、別の環境において人が物体を把持している場合においても生ずるものである。 Note that such a problem occurs not only when a worker is holding a work object or tool at a factory, but also when a person is holding an object in another environment.
 1つの側面において、本発明は、物体を把持する手の距離画像から手の位置を精度良く特定することを目的とする。 In one aspect, an object of the present invention is to accurately identify the position of a hand from a distance image of a hand holding an object.
 1つの案では、画像処理装置は、検出部、抽出部、及び特定部を含む。検出部は、距離画像から、手が写っている候補領域を検出する。抽出部は、候補領域を近似する複数の線分のうち隣接する2本の線分の間の角度に基づいて、候補領域を近似する複数の線分の中から、手領域を近似する線分を抽出する。特定部は、手領域を近似する線分を用いて、候補領域内における手の位置を特定する。 In one proposal, the image processing apparatus includes a detection unit, an extraction unit, and a specification unit. The detection unit detects a candidate area where a hand is captured from the distance image. The extraction unit extracts a line segment that approximates the hand region from a plurality of line segments that approximate the candidate region, based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region. To extract. The specifying unit specifies the position of the hand in the candidate area using a line segment that approximates the hand area.
 実施形態によれば、物体を把持する手の距離画像から手の位置を精度良く特定することができる。 According to the embodiment, the position of the hand can be accurately identified from the distance image of the hand holding the object.
3次元距離センサを示す図である。It is a figure which shows a three-dimensional distance sensor. 画像処理装置の機能的構成図である。It is a functional block diagram of an image processing apparatus. 画像処理のフローチャートである。It is a flowchart of an image process. 画像処理装置の第1の具体例を示す機能的構成図である。It is a functional block diagram which shows the 1st specific example of an image processing apparatus. 撮像装置の視野範囲を示す図である。It is a figure which shows the visual field range of an imaging device. 画像処理の具体例を示すフローチャートである。It is a flowchart which shows the specific example of an image process. 候補領域検出処理のフローチャートである。It is a flowchart of a candidate area | region detection process. 距離画像及び背景距離画像を示す図である。It is a figure which shows a distance image and a background distance image. 二値化された差分画像を示す図である。It is a figure which shows the binarized difference image. 手位置検出処理のフローチャートである。It is a flowchart of a hand position detection process. 線分検出処理のフローチャートである。It is a flowchart of a line segment detection process. 線分情報を示す図である。It is a figure which shows line segment information. パラメタ計算処理のフローチャートである。It is a flowchart of a parameter calculation process. 折れ曲がり判定処理のフローチャートである。It is a flowchart of a bending determination process. 折れ曲がり判定処理の対象となる4本の線分を示す図である。It is a figure which shows four line segments used as the object of bending determination processing. 長さ判定処理のフローチャートである。It is a flowchart of a length determination process. 長さ判定処理の対象となる4本の線分を示す図である。It is a figure which shows four line segments used as the object of length determination processing. 長さ判定処理の対象となる3本の線分を示す図である。It is a figure which shows three line segments used as the object of length determination processing. 法線判定処理の対象となる線分を示す図である。It is a figure which shows the line segment used as the object of a normal line determination process. 位置特定処理のフローチャートである。It is a flowchart of a position specific process. 最大内接円を示す図である。It is a figure which shows the maximum inscribed circle. 画像処理装置の第2の具体例を示す機能的構成図である。It is a functional block diagram which shows the 2nd specific example of an image processing apparatus. 上肢の位置を求める場合の撮像装置の視野範囲を示す図である。It is a figure which shows the visual field range of an imaging device in the case of calculating | requiring the position of an upper limb. 上肢の位置を求める画像処理の具体例を示すフローチャートである。It is a flowchart which shows the specific example of the image process which calculates | requires the position of an upper limb. 上肢の位置を求める場合の距離画像及び背景距離画像を示す図である。It is a figure which shows the distance image in the case of calculating | requiring the position of an upper limb, and a background distance image. 体領域検出処理のフローチャートである。It is a flowchart of a body region detection process. 3次元方向判定処理を含む手位置検出処理のフローチャートである。It is a flowchart of a hand position detection process including a three-dimensional direction determination process. 3次元方向判定処理のフローチャートである。It is a flowchart of a three-dimensional direction determination process. 周辺領域を示す図である。It is a figure which shows a peripheral region. 第1主成分のベクトルを示す図である。It is a figure which shows the vector of a 1st main component. 上肢位置検出処理のフローチャートである。It is a flowchart of an upper limb position detection process. 手首の位置を示す図である。It is a figure which shows the position of a wrist. 画像処理システムの機能的構成図である。It is a functional block diagram of an image processing system. 情報処理装置の構成図である。It is a block diagram of information processing apparatus.
 以下、図面を参照しながら、実施形態を詳細に説明する。
 図1は、組立ラインに設置された3次元距離センサの例を示している。作業者101は、作業台102上で道具、部品等の物体103を把持しながら、製品の組立作業を行う。作業台102の上方に設置された3次元距離センサ111の視野範囲112には、作業者101の手が含まれており、3次元距離センサ111は、作業中の手が写った距離画像を取得することができる。
Hereinafter, embodiments will be described in detail with reference to the drawings.
FIG. 1 shows an example of a three-dimensional distance sensor installed in an assembly line. The worker 101 performs product assembly work while holding an object 103 such as a tool or a part on the work table 102. The field of view 112 of the three-dimensional distance sensor 111 installed above the work table 102 includes the hand of the worker 101, and the three-dimensional distance sensor 111 obtains a distance image showing the hand during work. can do.
 しかし、作業者101と作業台102又は物体103が近接していたり、作業者101が物体103を把持していたりすると、距離画像に写った手の距離値と物体103の距離値との差分が小さくなる。このため、距離値をもとに前景と背景とを分離する方法では、手と物体103とを分離することが困難である。また、撮影した距離画像と背景のみの背景距離画像との差分に基づいて手領域を検出する場合であっても、手と物体103の両方が背景差分として検出されるため、両者を分離することは困難である。 However, if the worker 101 and the work table 102 or the object 103 are close to each other, or if the worker 101 is holding the object 103, the difference between the distance value of the hand and the distance value of the object 103 in the distance image is obtained. Get smaller. For this reason, it is difficult to separate the hand and the object 103 by the method of separating the foreground and the background based on the distance value. Even if the hand region is detected based on the difference between the captured distance image and the background distance image of only the background, both the hand and the object 103 are detected as the background difference, and therefore both are separated. It is difficult.
 例えば、特許文献1に開示された技術では、機械学習を用いたトラッキングによって、ユーザの身体部位の骨格が認識され、パッシブ方式又はアクティブ方式のトラッキングによって、ユーザが把持している道具が認識される。パッシブ方式のトラッキングでは、赤外線回帰反射型マーカが道具に装着されており、カメラ等の外部デバイスによってマーカを検出することで、道具が認識される。一方、アクティブ方式のトラッキングでは、3次元位置センサ、加速度センサ等が道具に内蔵されており、それらのセンサが道具の位置をトラッキングシステムに通知する。 For example, in the technique disclosed in Patent Document 1, a skeleton of a user's body part is recognized by tracking using machine learning, and a tool held by the user is recognized by tracking in a passive method or active method. . In passive tracking, an infrared retro-reflective marker is attached to a tool, and the tool is recognized by detecting the marker with an external device such as a camera. On the other hand, in active tracking, a three-dimensional position sensor, an acceleration sensor, and the like are built in the tool, and these sensors notify the tracking system of the position of the tool.
 しかし、この技術では、マーカ又はセンサが装着されていない未知の物体を用いた作業において、その物体と手との距離が近い場合には、物体及び手の位置を正しく認識することが難しい。また、機械学習を用いて先に骨格を認識する場合、物体を把持した状態を事前に学習しておくことが望ましいが、未知の形状及び未知のサイズのすべての物体について、その物体を把持した状態を事前に学習しておくことは現実的ではない。 However, with this technology, it is difficult to correctly recognize the position of the object and the hand when the distance between the object and the hand is short in an operation using an unknown object on which no marker or sensor is mounted. Also, when recognizing a skeleton using machine learning, it is desirable to learn the state of grasping the object in advance, but for all objects of unknown shape and unknown size, grasp the object. It is not realistic to learn the state in advance.
 図2は、画像処理装置の機能的構成例を示している。図2の画像処理装置201は、検出部211、抽出部212、及び特定部213を含む。 FIG. 2 shows a functional configuration example of the image processing apparatus. The image processing apparatus 201 in FIG. 2 includes a detection unit 211, an extraction unit 212, and a specification unit 213.
 図3は、図2の画像処理装置201が行う画像処理の例を示すフローチャートである。まず、検出部211は、距離画像から、手が写っている候補領域を検出する(ステップ301)。次に、抽出部212は、候補領域を近似する複数の線分のうち隣接する2本の線分の間の角度に基づいて、候補領域を近似する複数の線分の中から、手領域を近似する線分を抽出する(ステップ302)。そして、特定部213は、手領域を近似する線分を用いて、候補領域内における手の位置を特定する(ステップ303)。 FIG. 3 is a flowchart illustrating an example of image processing performed by the image processing apparatus 201 in FIG. First, the detection unit 211 detects a candidate area where a hand is shown from the distance image (step 301). Next, the extraction unit 212 selects a hand region from a plurality of line segments that approximate the candidate region based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region. An approximate line segment is extracted (step 302). Then, the specifying unit 213 specifies the position of the hand in the candidate area using a line segment that approximates the hand area (step 303).
 このような画像処理装置201によれば、物体を把持する手の距離画像から手の位置を精度良く特定することができる。 According to such an image processing apparatus 201, the position of the hand can be accurately identified from the distance image of the hand holding the object.
 図4は、図2の画像処理装置201の第1の具体例を示している。図4の画像処理装置201は、検出部211、抽出部212、特定部213、出力部411、及び記憶部412を含む。抽出部212は、線分検出部421及び判定部422を含む。撮像装置401は、3次元距離センサの一例であり、撮像装置401から被写体までの距離を表す画素値を含む距離画像432を撮影して、画像処理装置201へ出力する。撮像装置401とし
ては、例えば、赤外線カメラを用いることができる。
FIG. 4 shows a first specific example of the image processing apparatus 201 of FIG. The image processing apparatus 201 in FIG. 4 includes a detection unit 211, an extraction unit 212, a specification unit 213, an output unit 411, and a storage unit 412. The extraction unit 212 includes a line segment detection unit 421 and a determination unit 422. The imaging device 401 is an example of a three-dimensional distance sensor. The imaging device 401 captures a distance image 432 including a pixel value representing the distance from the imaging device 401 to the subject and outputs the captured image to the image processing device 201. As the imaging device 401, for example, an infrared camera can be used.
 図5は、撮像装置401の視野範囲の例を示している。図5の撮像装置401は、作業台102の上方に設置されており、その視野範囲501には、作業台102、作業者101の左手502及び右手503、及び右手503が把持している物体103が含まれている。したがって、撮像装置401が撮影する距離画像432には、作業台102、左手502、右手503、及び物体103が写っている。 FIG. 5 shows an example of the field of view range of the imaging apparatus 401. The imaging device 401 in FIG. 5 is installed above the workbench 102, and the visual field range 501 includes the object 103 held by the workbench 102, the left hand 502 and the right hand 503 of the worker 101, and the right hand 503. It is included. Therefore, the work table 102, the left hand 502, the right hand 503, and the object 103 are shown in the distance image 432 captured by the imaging device 401.
 記憶部412は、背景距離画像431及び距離画像432を記憶する。背景距離画像431は、視野範囲501内に左手502、右手503、及び物体103が含まれない状態で、事前に撮影された距離画像である。 The storage unit 412 stores a background distance image 431 and a distance image 432. The background distance image 431 is a distance image captured in advance in a state where the left hand 502, the right hand 503, and the object 103 are not included in the visual field range 501.
 検出部211は、背景距離画像431を用いて距離画像432から候補領域を検出し、検出した候補領域を示す領域情報433を記憶部412に格納する。候補領域は、作業者101の左手502又は右手503のうち、少なくとも一方の手が写っていると推定される領域である。例えば、距離画像432の端部に接する領域が、候補領域として検出される。 The detecting unit 211 detects a candidate area from the distance image 432 using the background distance image 431 and stores area information 433 indicating the detected candidate area in the storage unit 412. The candidate area is an area in which at least one of the left hand 502 or the right hand 503 of the worker 101 is estimated to be captured. For example, an area in contact with the end of the distance image 432 is detected as a candidate area.
 線分検出部421は、領域情報433が示す候補領域を細線化することで曲線を求め、その曲線を近似する、連結した複数の線分を、候補領域を近似する複数の線分として求める。そして、線分検出部421は、それらの線分を示す線分情報434を記憶部412に格納する。 The line segment detection unit 421 obtains a curve by thinning the candidate area indicated by the area information 433, and obtains a plurality of connected line segments that approximate the curve as a plurality of line segments that approximate the candidate area. Then, the line segment detection unit 421 stores line segment information 434 indicating those line segments in the storage unit 412.
 判定部422は、線分情報434が示す複数の線分に含まれる、隣接する2本の線分の組み合わせ毎に線分間の角度を求めるとともに、各線分に対応する3次元空間内の線分の長さを求める。そして、判定部422は、求めた角度及び長さを用いて各線分が手領域に対応するか否かを判定し、線分情報434が示す複数の線分の中から、手領域を近似する線分を抽出する。 The determination unit 422 obtains the angle of the line segment for each combination of two adjacent line segments included in the plurality of line segments indicated by the line segment information 434, and the line segment in the three-dimensional space corresponding to each line segment. Find the length of. Then, the determination unit 422 determines whether each line segment corresponds to the hand area using the obtained angle and length, and approximates the hand area from the plurality of line segments indicated by the line segment information 434. Extract line segments.
 このとき、判定部422は、2本の線分の間の角度が閾値θminよりも小さい場合、距
離画像432の端部までの距離がそれらの線分よりも遠い別の線分を、手領域を近似する線分の候補から除外する。θminは、手首又は肘の関節の折れ曲がり角度の下限値を表す
閾値であってもよい。
At this time, when the angle between the two line segments is smaller than the threshold θ min , the determination unit 422 selects another line segment whose distance to the end of the distance image 432 is farther than those line segments. Exclude the region from the approximate line segments. θ min may be a threshold value representing the lower limit value of the bending angle of the wrist or elbow joint.
 また、判定部422は、2本の線分の間の角度が閾値θminよりも大きく、閾値θmaxよりも小さい場合、それらの線分のうち、距離画像432の端部までの距離が遠い方の線分を、手領域を近似する線分の候補から除外するか否かを判定する。この判定には、距離画像432の端部までの距離が近い方の線分に対応する、3次元空間内の線分の長さが用いられる。 In addition, when the angle between the two line segments is larger than the threshold value θ min and smaller than the threshold value θ max , the determination unit 422 is far from the end of the distance image 432 among those line segments. It is determined whether or not the other line segment is excluded from the line segment candidates that approximate the hand region. For this determination, the length of the line segment in the three-dimensional space corresponding to the line segment closer to the end of the distance image 432 is used.
 次に、判定部422は、残された線分のうち、距離画像432の端部までの距離が最も遠い線分上の複数の点それぞれから候補領域の輪郭までの距離に基づいて、その線分から手領域を近似する部分を抽出する。そして、判定部422は、抽出した部分を示す手領域線分情報435を記憶部412に格納する。 Next, the determination unit 422 determines the line based on the distance from each of a plurality of points on the line segment that is the farthest to the end of the distance image 432 from the remaining line segments to the contour of the candidate region. Extract the part that approximates the hand area from the minutes. Then, the determination unit 422 stores hand region line segment information 435 indicating the extracted portion in the storage unit 412.
 特定部213は、領域情報433が示す候補領域の輪郭と、手領域線分情報435が示す線分の部分とを用いて、候補領域内における手の位置を特定し、特定した位置を示す位置情報436を記憶部412に格納する。出力部411は、位置情報436に基づく認識結果を出力する。 The specifying unit 213 uses the contour of the candidate area indicated by the area information 433 and the line segment indicated by the hand area line segment information 435 to specify the position of the hand in the candidate area, and indicates the specified position. Information 436 is stored in the storage unit 412. The output unit 411 outputs a recognition result based on the position information 436.
 出力部411は、認識結果を画面上に表示する表示部であってもよく、認識結果を別の画像処理装置へ送信する送信部であってもよい。認識結果は、手の3次元位置の変化を示す軌跡であってもよく、手の軌跡から推定される作業者の動作を示す情報であってもよい。 The output unit 411 may be a display unit that displays the recognition result on the screen, or may be a transmission unit that transmits the recognition result to another image processing apparatus. The recognition result may be a trajectory indicating a change in the three-dimensional position of the hand, or information indicating an operator's motion estimated from the hand trajectory.
 図4の画像処理装置201によれば、距離画像432から検出した候補領域を近似する線分の中から、線分間の角度及び3次元空間内における線分の長さに基づいて、物体103に対応する線分を除外することができる。そして、残された線分のうち、距離画像432の端部までの距離が最も遠い線分を含む領域内で手の位置を特定することにより、物体103が未知の物体であっても、右手503の位置を精度良く特定することができる。したがって、未知の物体に近接している手の位置の認識精度が向上し、作業者の動作の認識精度も向上する。 According to the image processing apparatus 201 in FIG. 4, the object 103 is detected based on the angle of the line segment and the length of the line segment in the three-dimensional space from the line segments approximating the candidate area detected from the distance image 432. Corresponding line segments can be excluded. Then, by identifying the position of the hand within the remaining line segment including the line segment that is the farthest from the end of the distance image 432, even if the object 103 is an unknown object, the right hand The position of 503 can be specified with high accuracy. Therefore, the recognition accuracy of the position of the hand in proximity to the unknown object is improved, and the recognition accuracy of the operator's motion is also improved.
 図6は、図4の画像処理装置201が行う画像処理の具体例を示すフローチャートである。まず、検出部211は、候補領域検出処理を行い(ステップ601)、次に、抽出部212及び特定部213は、手位置検出処理を行う(ステップ602)。 FIG. 6 is a flowchart showing a specific example of image processing performed by the image processing apparatus 201 in FIG. First, the detection unit 211 performs candidate area detection processing (step 601), and then the extraction unit 212 and the specifying unit 213 perform hand position detection processing (step 602).
 図7は、図6のステップ601における候補領域検出処理の例を示すフローチャートである。まず、検出部211は、距離画像432の各画素の画素値(距離値)から背景距離画像431の各画素の画素値を減算して、差分画像を生成し、生成した差分画像を二値化する(ステップ701)。 FIG. 7 is a flowchart showing an example of candidate area detection processing in step 601 of FIG. First, the detection unit 211 subtracts the pixel value of each pixel of the background distance image 431 from the pixel value (distance value) of each pixel of the distance image 432, generates a difference image, and binarizes the generated difference image. (Step 701).
 図8は、距離画像432及び背景距離画像431の例を示している。図8(a)は、距離画像432の例を示しており、図8(b)は、背景距離画像431の例を示している。距離画像432及び背景距離画像431の画素値は、撮像装置401から被写体までの距離を表すため、被写体が撮像装置401に近いほど小さくなり、被写体が撮像装置401から遠いほど大きくなる。この場合、距離画像432と背景距離画像431に共通する背景の画素値の差分は0に近くなるが、背景よりも撮像装置401に近い前景の画素値の差分は負値になる。 FIG. 8 shows an example of the distance image 432 and the background distance image 431. FIG. 8A shows an example of the distance image 432, and FIG. 8B shows an example of the background distance image 431. Since the pixel values of the distance image 432 and the background distance image 431 represent the distance from the imaging device 401 to the subject, the pixel values are smaller as the subject is closer to the imaging device 401 and larger as the subject is farther from the imaging device 401. In this case, the difference between the background pixel values common to the distance image 432 and the background distance image 431 is close to 0, but the difference between the foreground pixel values closer to the imaging device 401 than the background is a negative value.
 そこで、検出部211は、負の所定値を閾値T1として用いて、画素値の差分をT1と比較し、差分がT1未満である場合、差分画像の画素値を255(白)に設定し、差分がT1以上である場合、差分画像の画素値を0(黒)に設定する。これにより、差分画像が二値化される。 Therefore, the detection unit 211 compares the pixel value difference with T1 using a negative predetermined value as the threshold T1, and if the difference is less than T1, sets the pixel value of the difference image to 255 (white), If the difference is equal to or greater than T1, the pixel value of the difference image is set to 0 (black). Thereby, the difference image is binarized.
 図9は、二値化された差分画像の例を示している。図9(a)は、ステップ701において生成される二値画像の例を示している。差分画像を二値化すると、作業者の両手の画素値だけでなく、手に近接している物体の画素値も白に設定される。 FIG. 9 shows an example of a binarized difference image. FIG. 9A shows an example of a binary image generated in step 701. When the difference image is binarized, not only the pixel value of both hands of the operator but also the pixel value of an object close to the hand is set to white.
 次に、検出部211は、二値画像の各画素に対してオープニング及びクロージングを行う(ステップ702)。まず、オープニングを行うことで白画素が減少し、白の小領域が除去される。その後、クロージングを行うことで、オープニングによって生成された黒の孤立点を白に変更することができる。これにより、図9(a)の二値画像から図9(b)の二値画像が生成される。 Next, the detection unit 211 performs opening and closing for each pixel of the binary image (step 702). First, by performing the opening, white pixels are reduced and small white areas are removed. Thereafter, by performing closing, the black isolated point generated by the opening can be changed to white. Thereby, the binary image of FIG. 9B is generated from the binary image of FIG.
 次に、検出部211は、各白画素を注目画素として選択し、注目画素の距離画像432における画素値と、注目画素の上下左右に隣接する白画素の距離画像432における画素値との差分を求める。そして、検出部211は、差分の絶対値の最大値を所定の閾値T2と比較し、最大値がT2以上である場合、注目画素を白画素から黒画素に変更する(ステップ703)。これにより、図9(b)の二値画像から図9(c)の二値画像が生成され
る。
Next, the detection unit 211 selects each white pixel as a target pixel, and calculates a difference between the pixel value in the distance image 432 of the target pixel and the pixel value in the distance image 432 of white pixels adjacent to the target pixel in the upper, lower, left, and right directions. Ask. Then, the detection unit 211 compares the maximum absolute value of the difference with a predetermined threshold T2, and when the maximum value is equal to or greater than T2, changes the target pixel from a white pixel to a black pixel (step 703). Thereby, the binary image of FIG. 9C is generated from the binary image of FIG. 9B.
 次に、検出部211は、輪郭検出処理を行って複数の白領域を検出し、検出した白領域の中から、次の条件を満たす白領域を候補領域として選択する(ステップ704)。
(1)白領域が二値画像の端部に接している。
(2)白領域の面積が所定値以上である。
(3)面積が大きい方から最大2個の白領域である。
Next, the detection unit 211 performs contour detection processing to detect a plurality of white areas, and selects a white area that satisfies the following condition as a candidate area from the detected white areas (step 704).
(1) The white area is in contact with the end of the binary image.
(2) The area of the white region is not less than a predetermined value.
(3) A maximum of two white regions from the larger area.
 例えば、図8(a)に示した距離画像432の場合、画像の下端(体側)から上端に向かって作業者の腕が伸びているため、二値画像の下端に接している白領域が候補領域として選択される。この場合、図9(c)の二値画像から図9(d)に示す1個の白領域が選択される。 For example, in the case of the distance image 432 illustrated in FIG. 8A, the operator's arm extends from the lower end (body side) of the image toward the upper end, and thus a white region in contact with the lower end of the binary image is a candidate. Selected as a region. In this case, one white region shown in FIG. 9D is selected from the binary image shown in FIG.
 次に、検出部211は、選択した白領域の輪郭を平滑化する(ステップ705)。ステップ703において、作業者が着ている服のしわ等の影響により、白領域の輪郭に凹凸が発生することがある。そこで、ステップ705の処理を行うことで、このような凹凸を平滑化することができる。例えば、平滑化処理として、クロージングを用いてもよい。平滑化処理によって、図9(d)の二値画像から図9(e)の二値画像が生成される。 Next, the detection unit 211 smoothes the outline of the selected white area (step 705). In step 703, the outline of the white region may be uneven due to the influence of wrinkles on clothes worn by the worker. Therefore, by performing the processing in step 705, such unevenness can be smoothed. For example, closing may be used as the smoothing process. By the smoothing process, the binary image shown in FIG. 9E is generated from the binary image shown in FIG. 9D.
 次に、検出部211は、白領域が二値画像の下端に2箇所で接しており、下端と接している輪郭の画素数が所定値以上である場合、その白領域を、二値画像の下端に1箇所のみで接する2個の白領域に分割する(ステップ706)。そして、検出部211は、分割によって生成された白領域を示す領域情報433を生成する。 Next, when the white region is in contact with the lower end of the binary image at two locations and the number of contour pixels in contact with the lower end is equal to or greater than a predetermined value, the detection unit 211 detects the white region as the binary image. The image is divided into two white areas that contact the lower end at only one place (step 706). Then, the detection unit 211 generates region information 433 indicating the white region generated by the division.
 例えば、二値画像の上端から下端に向かってy軸を設定し、左端から右端に向かってx軸を設定した場合、二値画像の下端に近づくほどy座標の値が増加する。この場合、検出部211は、二値画像の輪郭のうち、白領域と接している2箇所の輪郭部分で挟まれた輪郭部分を求め、求めた輪郭部分の画素のうち、最上部に位置する画素(y座標が最小の画素)のx座標x1を求める。そして、検出部211は、白領域内でx座標がx1であるすべての白画素を黒画素に変更することで、その白領域を2個の白領域に分割することができる。 For example, when the y-axis is set from the upper end to the lower end of the binary image and the x-axis is set from the left end to the right end, the y-coordinate value increases as it approaches the lower end of the binary image. In this case, the detection unit 211 obtains a contour part sandwiched between two contour parts in contact with the white region among the contours of the binary image, and is located at the top of the pixels of the obtained contour part. The x coordinate x1 of the pixel (the pixel with the smallest y coordinate) is obtained. Then, the detection unit 211 can divide the white area into two white areas by changing all white pixels whose x coordinate is x1 in the white area to black pixels.
 これにより、図9(e)の二値画像から図9(f)の二値画像が生成される。図9(f)の二値画像において、分割により生成された2個の白領域は、左手を含む領域及び右手を含む領域にそれぞれ対応する。 Thereby, the binary image shown in FIG. 9F is generated from the binary image shown in FIG. 9E. In the binary image of FIG. 9F, the two white regions generated by the division correspond to a region including the left hand and a region including the right hand, respectively.
 そこで、検出部211は、2個の白領域に含まれる輪郭部分のx座標を比較し、x座標が小さい方の白領域を左手の候補領域に決定し、x座標が大きい方の白領域を右手の候補領域に決定する。この場合、検出部211は、検出された候補領域の個数を表す変数nHandsに2を設定する。 Therefore, the detection unit 211 compares the x coordinates of the contour portions included in the two white areas, determines the white area having the smaller x coordinate as a candidate area for the left hand, and selects the white area having the larger x coordinate as the white area. Determine the candidate area for the right hand. In this case, the detection unit 211 sets 2 to a variable nHands representing the number of detected candidate regions.
 一方、ステップ704において、二値画像の下端に1箇所のみで接する2個の白領域を選択した場合、検出部211は、いずれの白領域も分割せず、nHandsに2を設定する。また、二値画像の下端に1箇所のみで接する1個の白領域を選択した場合、検出部211は、nHandsに1を設定する。 On the other hand, in step 704, when two white areas touching the lower end of the binary image at only one place are selected, the detection unit 211 sets nHands to 2 without dividing any white area. In addition, when one white region that touches the lower end of the binary image at only one location is selected, the detection unit 211 sets 1 to nHands.
 なお、距離画像432及び背景距離画像431の画素値として、被写体が撮像装置401に近いほど大きくなり、被写体が撮像装置401から遠いほど小さくなる値を用いることも可能である。この場合、ステップ701において、前景の画素値の差分は正値になる。そこで、検出部211は、正の所定値を閾値T1として用い、差分がT1よりも大きい
場合、差分画像の画素値を255(白)に設定し、差分がT1以下である場合、差分画像の画素値を0(黒)に設定する。
Note that as the pixel values of the distance image 432 and the background distance image 431, values that increase as the subject is closer to the imaging device 401 and decrease as the subject is farther from the imaging device 401 can be used. In this case, in step 701, the difference between the foreground pixel values becomes a positive value. Therefore, the detection unit 211 uses a positive predetermined value as the threshold T1, sets the pixel value of the difference image to 255 (white) when the difference is larger than T1, and sets the pixel value of the difference image to be equal to or less than T1. The pixel value is set to 0 (black).
 作業台102上で複数の作業者が作業を行っており、撮像装置401の視野範囲にN個(Nは3以上の整数)の手が含まれる場合、ステップ704において、検出部211は、面積が大きい方から最大N個の白領域を選択してもよい。 When a plurality of workers are working on the workbench 102 and the field of view of the imaging apparatus 401 includes N hands (N is an integer of 3 or more), in step 704, the detection unit 211 displays the area. A maximum of N white regions may be selected in order from the larger.
 図10は、図6のステップ602における手位置検出処理の例を示すフローチャートである。まず、抽出部212は、領域情報433が示す候補領域のうち、i番目の候補領域を示す制御変数iに0を設定し(ステップ1001)、iとnHandsとを比較する(ステップ1002)。iがnHands未満である場合(ステップ1002,YES)、線分検出部421は、i番目の候補領域に対する線分検出処理を行う(ステップ1003)。 FIG. 10 is a flowchart showing an example of the hand position detection process in step 602 of FIG. First, the extraction unit 212 sets 0 to the control variable i indicating the i-th candidate area among the candidate areas indicated by the area information 433 (step 1001), and compares i with nHands (step 1002). When i is less than nHands (step 1002, YES), the line segment detection unit 421 performs a line segment detection process for the i-th candidate region (step 1003).
 次に、判定部422は、パラメタ計算処理を行い(ステップ1004)、折れ曲がり判定処理を行い(ステップ1005)、長さ判定処理を行う(ステップ1006)。そして、特定部213は、位置特定処理を行う(ステップ1007)。 Next, the determination unit 422 performs parameter calculation processing (step 1004), performs bending determination processing (step 1005), and performs length determination processing (step 1006). Then, the specifying unit 213 performs position specifying processing (step 1007).
 次に、抽出部212は、iを1だけインクリメントして(ステップ1008)、ステップ1002以降の処理を繰り返す。そして、iがnHandsに達した場合(ステップ1002,NO)、抽出部212は、処理を終了する。 Next, the extraction unit 212 increments i by 1 (step 1008), and repeats the processing after step 1002. When i reaches nHands (step 1002, NO), the extraction unit 212 ends the process.
 図11は、図10のステップ1003における線分検出処理の例を示すフローチャートである。まず、線分検出部421は、i番目の候補領域を細線化する(ステップ1101)。例えば、線分検出部421は、非特許文献1に記載された田村の方式、Zhang-Suenの方式、NWGの方式等の細線化アルゴリズムを用いて、候補領域を細線化することができる。細線化の際に分岐が発生した場合、線分検出部421は、複数の枝のうち最も長い岐のみを残して、連続する点の配列から成る1本の曲線を生成する。 FIG. 11 is a flowchart showing an example of line segment detection processing in step 1003 of FIG. First, the line segment detection unit 421 thins the i-th candidate region (step 1101). For example, the line segment detection unit 421 can thin the candidate region using a thinning algorithm such as the Tamura method, the Zhang-Suen method, or the NWG method described in Non-Patent Document 1. When a branch occurs during thinning, the line segment detection unit 421 generates a single curve composed of an array of consecutive points, leaving only the longest branch of the plurality of branches.
 次に、線分検出部421は、連結した複数の線分で曲線を近似し、それらの線分を示す線分情報434を生成する(ステップ1102)。例えば、線分検出部421は、非特許文献2に記載されたRamer-Douglas-Peuckerアルゴリズムを用いて、曲線を複数の近似線
分に変換し、曲線と近似線分とのずれを所定の許容誤差の範囲内に収めることができる。
Next, the line segment detector 421 approximates a curve with a plurality of connected line segments, and generates line segment information 434 indicating these line segments (step 1102). For example, the line segment detection unit 421 uses the Ramer-Douglas-Peucker algorithm described in Non-Patent Document 2 to convert a curve into a plurality of approximate line segments, and allows a predetermined tolerance for the deviation between the curve and the approximate line segments. It can be within the range of error.
 このような線分検出処理によれば、候補領域が連結した線分の組み合わせに変換されるため、簡単な判定処理により、各線分と上肢、手、物体等との対応関係を決定することが可能になる。 According to such a line segment detection process, since the candidate area is converted into a combination of connected line segments, it is possible to determine the correspondence between each line segment and the upper limb, hand, object, etc. by a simple determination process. It becomes possible.
 図12は、配列形式の線分情報434の例を示している。配列番号は、二値画像内における各線分の両端のうち一方の端点を示す識別情報であり、x座標及びy座標は、二値画像と距離画像432に共通する、端点の座標(x,y)を表す。配列番号0に対応する点は、候補領域の下端に位置し、配列番号が増加するに従って、対応する点は下端から遠ざかっていく。距離値Zは、距離画像432の座標(x,y)に対応する画素値を表す。 FIG. 12 shows an example of line segment information 434 in an array format. The array number is identification information indicating one end point of both ends of each line segment in the binary image, and the x coordinate and the y coordinate are coordinate points (x, y) common to the binary image and the distance image 432. ). The point corresponding to sequence number 0 is located at the lower end of the candidate region, and the corresponding point moves away from the lower end as the sequence number increases. The distance value Z represents a pixel value corresponding to the coordinates (x, y) of the distance image 432.
 図12の線分情報434は、連結したn本の線分を表しており、j番目(j=0~n-1)の線分の両端は、配列番号jに対応する点と配列番号j+1に対応する点の2点である。j番目の線分とj+1番目の線分は隣接しており、配列番号j+1に対応する点で連結している。この場合、線分検出部421は、端点の個数を表す変数nPtsにn+1を設定する。 The line segment information 434 in FIG. 12 represents the connected n line segments, and both ends of the j-th (j = 0 to n−1) line segment correspond to the point corresponding to the array element number j and the array element number j + 1. 2 points corresponding to. The j-th line segment and the j + 1-th line segment are adjacent to each other and are connected at a point corresponding to the array element number j + 1. In this case, the line segment detection unit 421 sets n + 1 to a variable nPts that represents the number of end points.
 図13は、図10のステップ1004におけるパラメタ計算処理の例を示すフローチャートである。まず、判定部422は、線分情報434の配列番号を表す制御変数jに0を設定し(ステップ1301)、jとnPts-1とを比較する(ステップ1302)。jがnPts-1未満である場合(ステップ1302,YES)、判定部422は、配列番号j及び配列番号j+1に対応する2点を両端とするj番目の線分の3次元空間における長さlenjを計算する(ステップ1303)。 FIG. 13 is a flowchart showing an example of parameter calculation processing in step 1004 of FIG. First, the determination unit 422 sets 0 to the control variable j representing the array element number of the line segment information 434 (step 1301), and compares j with nPts−1 (step 1302). When j is less than nPts−1 (step 1302, YES), the determination unit 422 determines the length len in the three-dimensional space of the j-th line segment having both ends corresponding to the array element number j and the array element number j + 1. j is calculated (step 1303).
 例えば、判定部422は、配列番号j及び配列番号j+1に対応する2点の座標(x,y)を用いて、ピンホールカメラモデルにより長さlenjを求めることができる。ピン
ホールカメラモデルを用いた場合、二値画像内における点(x1,y1)と点(x2,y2)との間の線分に対応する、3次元空間における線分の長さlen1は、次式により計算される。
For example, the determination unit 422 can obtain the length len j by a pinhole camera model using the coordinates (x, y) of two points corresponding to the array element number j and the array element number j + 1. When the pinhole camera model is used, the length len1 of the line segment in the three-dimensional space corresponding to the line segment between the point (x1, y1) and the point (x2, y2) in the binary image is Calculated by the formula.
X1=(Z1×(x1-cx))/fx   (1)
Y1=(Z1×(y1-cy))/fy   (2)
X2=(Z2×(x2-cx))/fx   (3)
Y2=(Z2×(y2-cy))/fy   (4)
len1=((X1-X2)2+(Y1-Y2)2+(Z1-Z2)21/2   (5)
X1 = (Z1 × (x1−cx)) / fx (1)
Y1 = (Z1 × (y1-cy)) / fy (2)
X2 = (Z2 × (x2-cx)) / fx (3)
Y2 = (Z2 × (y2-cy)) / fy (4)
len1 = ((X1-X2) 2 + (Y1-Y2) 2 + (Z1-Z2) 2 ) 1/2 (5)
 Z1及びZ2は、それぞれ、点(x1,y1)及び点(x2,y2)の距離値Zを表し、(cx,cy)は、二値画像内における主点の座標を表す。例えば、二値画像の中心が主点として用いられる。fx及びfyは、それぞれ、x軸方向及びy軸方向のピクセル単位で表される焦点距離である。3次元空間の座標(X,Y,Z)を表す座標系の原点は、撮像装置401の設置位置であってもよい。 Z1 and Z2 represent the distance value Z of the point (x1, y1) and the point (x2, y2), respectively, and (cx, cy) represents the coordinates of the principal point in the binary image. For example, the center of the binary image is used as the principal point. fx and fy are focal lengths expressed in units of pixels in the x-axis direction and the y-axis direction, respectively. The origin of the coordinate system representing the coordinates (X, Y, Z) in the three-dimensional space may be the installation position of the imaging device 401.
 次に、判定部422は、jとnPts-2とを比較する(ステップ1304)。jがnPts-2未満である場合(ステップ1304,YES)、判定部422は、j番目の線分と、配列番号j+1及び配列番号j+2に対応する2点を両端とするj+1番目の線分との間の角度θjを計算する(ステップ1305)。 Next, the determination unit 422 compares j with nPts-2 (step 1304). When j is less than nPts−2 (step 1304, YES), the determination unit 422 determines that the jth line segment and the j + 1th line segment having both ends corresponding to the array element number j + 1 and the array element number j + 2 are both ends. calculating the angle theta j between (step 1305).
 例えば、判定部422は、配列番号j、配列番号j+1、及び配列番号j+2に対応する3点の座標(x,y)を用いて、非特許文献3に記載された内積の計算により、角度θjを求めることができる。 For example, the determination unit 422 calculates the angle θ by calculating the inner product described in Non-Patent Document 3 using the coordinates (x, y) of three points corresponding to the array element number j, the array element number j + 1, and the array element number j + 2. j can be obtained.
 次に、判定部422は、jを1だけインクリメントして(ステップ1306)、ステップ1302以降の処理を繰り返す。そして、jがnPts-2に達した場合(ステップ1304,NO)、判定部422は、ステップ1305の処理をスキップして、ステップ1306以降の処理を繰り返す。次に、jがnPts-1に達した場合(ステップ1302,NO)、判定部422は、処理を終了する。 Next, the determination unit 422 increments j by 1 (step 1306), and repeats the processing after step 1302. If j reaches nPts−2 (step 1304, NO), the determination unit 422 skips the process of step 1305 and repeats the processes after step 1306. Next, when j reaches nPts−1 (step 1302, NO), the determination unit 422 ends the process.
 図14は、図10のステップ1005における折れ曲がり判定処理の例を示すフローチャートである。折れ曲がり判定処理では、線分間の角度θjに対する判定結果に基づいて
、手よりも先に存在すると推定される線分が除外される。
FIG. 14 is a flowchart illustrating an example of the bending determination process in step 1005 of FIG. In the bending determination process, a line segment estimated to exist before the hand is excluded based on the determination result for the angle θ j of the line segment.
 人体における関節の配置から、上腕から手までの間に存在する折れ曲がり箇所は、肘及び手首の2箇所である。したがって、腕及び手が写っている候補領域の下端から手までの間における折れ曲がりの個数の最大値は2個であり、3番目以降の折れ曲がりは、指による折れ曲がりか、又は手以外の物体による折れ曲がりであると推定される。また、折れ曲がりの個数が2個以下の場合であっても、角度θjが肘又は手首の可動角度よりも小さい
場合は、指又は物体による折れ曲がりであると推定される。
From the arrangement of joints in the human body, there are two bent places between the upper arm and the hand, the elbow and the wrist. Therefore, the maximum number of folds between the lower end and the hand of the candidate area where the arm and hand are shown is two, and the third and subsequent folds are bent by fingers or bent by an object other than the hand. It is estimated that. Even when the number of bends is two or less, if the angle θ j is smaller than the movable angle of the elbow or wrist, it is estimated that the bend is caused by a finger or an object.
 まず、判定部422は、制御変数jに0を設定し、折れ曲がりの個数を表す変数Nbendに0を設定して(ステップ1401)、jとnPts-2とを比較する(ステップ1402)。jがnPts-2未満である場合(ステップ1402,YES)、判定部422は、θjとθminとを比較する(ステップ1403)。例えば、θminは、非特許文献4
に記載された人体の各関節の可動角度のうち、肘又は手首の可動角度に基づいて決定することができる。
First, the determination unit 422 sets the control variable j to 0, sets the variable Nbend representing the number of bends to 0 (step 1401), and compares j with nPts-2 (step 1402). When j is less than nPts−2 (step 1402, YES), the determination unit 422 compares θ j and θ min (step 1403). For example, θ min is non-patent document 4.
Among the movable angles of the joints of the human body described in (1), it can be determined based on the movable angle of the elbow or wrist.
 θjがθmin以上である場合(ステップ1403,YES)、判定部422は、θjとθmaxとを比較する(ステップ1404)。θmaxは、折れ曲がりなしと判定するための閾値
であり、θminよりも大きく、180°よりも小さな値に設定される。例えば、θmaxは、150°~170°の範囲の角度であってもよい。
When θ j is equal to or larger than θ min (step 1403, YES), the determination unit 422 compares θ j and θ max (step 1404). θ max is a threshold value for determining that there is no bending, and is set to a value larger than θ min and smaller than 180 °. For example, θ max may be an angle in the range of 150 ° to 170 °.
 θjがθmax未満である場合(ステップ1404,NO)、判定部422は、Nbendを1だけインクリメントして(ステップ1405)、Nbendと2とを比較する(ステップ1406)。Nbendが2以下である場合(ステップ1406,NO)、判定部422は、jを1だけインクリメントして(ステップ1408)、ステップ1402以降の処理を繰り返す。 If θ j is less than θ max (step 1404, NO), the determination unit 422 increments Nbend by 1 (step 1405) and compares Nbend with 2 (step 1406). When Nbend is 2 or less (step 1406, NO), the determination unit 422 increments j by 1 (step 1408), and repeats the processing after step 1402.
 そして、Nbendが2を超えた場合(ステップ1406,YES)、判定部422は、線分情報434から配列番号j+2以降の点を削除して、nPtsをn+1からj+2に変更する(ステップ1407)。これにより、指又は物体に対応する線分が線分情報434から削除される。 When Nbend exceeds 2 (step 1406, YES), the determination unit 422 deletes the points after the array element number j + 2 from the line segment information 434, and changes nPts from n + 1 to j + 2 (step 1407). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434.
 θjがθmin未満である場合(ステップ1403,NO)、判定部422は、ステップ1407の処理を行う。これにより、θjが肘又は手首の可動角度よりも小さい場合に、指
又は物体に対応する線分が線分情報434から削除される。
When θ j is less than θ min (step 1403, NO), the determination unit 422 performs the process of step 1407. Thereby, when θ j is smaller than the movable angle of the elbow or wrist, the line segment corresponding to the finger or the object is deleted from the line segment information 434.
 θjがθmax以上である場合(ステップ1404,YES)、判定部422は、配列番号j+1の点が折れ曲がりに該当しないと判定し、ステップ1405及びステップ1406の処理をスキップして、ステップ1408以降の処理を繰り返す。そして、jがnPts-2に達した場合(ステップ1402,NO)、判定部422は、処理を終了する。 When θ j is equal to or larger than θ max (step 1404, YES), the determination unit 422 determines that the point of the array element number j + 1 does not correspond to bending, skips the processing of step 1405 and step 1406, and after step 1408 Repeat the process. When j reaches nPts−2 (step 1402, NO), the determination unit 422 ends the process.
 このような折れ曲がり判定処理によれば、線分情報434が示す線分の本数を削減して、手領域を近似する線分の候補を絞り込むことができる。 According to such a bending determination process, the number of line segments indicated by the line segment information 434 can be reduced, and line segment candidates that approximate the hand area can be narrowed down.
 図15は、折れ曲がり判定処理の対象となる4本の線分の例を示している。pts[j](j=0~4)は、配列番号jの点を表し、θj(j=0~2)は、pts[j]、p
ts[j+1]、及びpts[j+2]の3点から計算された線分間の角度を表す。
FIG. 15 shows an example of four line segments to be subjected to bending determination processing. pts [j] (j = 0 to 4) represents a point of the array element number j, and θ j (j = 0 to 2) represents pts [j], p
This represents the angle between line segments calculated from the three points ts [j + 1] and pts [j + 2].
 pts[0]及びpts[1]を両端とする線分は上腕に対応し、pts[1]及びpts[2]を両端とする線分は前腕に対応し、pts[1]は肘に対応する。また、pts[2]及びpts[3]を両端とする線分は手に対応し、pts[2]は手首に対応する。一方、pts[3]及びpts[4]を両端とする線分は、手が把持している物体に対応し、pts[3]は指の関節に対応する。 A line segment with both ends of pts [0] and pts [1] corresponds to the upper arm, a line segment with both ends of pts [1] and pts [2] corresponds to the forearm, and pts [1] corresponds to the elbow. To do. In addition, line segments having both ends of pts [2] and pts [3] correspond to the hand, and pts [2] corresponds to the wrist. On the other hand, a line segment having both ends of pts [3] and pts [4] corresponds to an object held by the hand, and pts [3] corresponds to a finger joint.
 この場合、j=2のときにNbend=3となり、Nbendが2を超えるため、線分情報434からpts[4]が削除され、nPtsが5から4に変更される。これにより、物体に対応する線分が削除される。 In this case, when j = 2, Nbend = 3 and Nbend exceeds 2, so pts [4] is deleted from the line segment information 434 and nPts is changed from 5 to 4. Thereby, the line segment corresponding to the object is deleted.
 図16は、図10のステップ1006における長さ判定処理の例を示すフローチャートである。長さ判定処理では、線分間の角度θj及び線分の長さlenjに対する判定結果に基づいて、手よりも先に存在すると推定される線分が除外される。 FIG. 16 is a flowchart illustrating an example of the length determination process in step 1006 of FIG. In the length determination process, a line segment that is presumed to exist before the hand is excluded based on the determination result with respect to the angle θ j of the line segment and the length len j of the line segment.
 まず、判定部422は、制御変数jに0を設定し、3次元空間における長さを表す変数lenに0を設定し、線分の本数を表す変数ctrに0を設定して(ステップ1601)、jとnPts-1とを比較する(ステップ1602)。jがnPts-1未満である場合(ステップ1602,YES)、判定部422は、lenにlenjを加算して(ステ
ップ1603)、θjとθmaxとを比較する(ステップ1604)。θjがθmax未満である場合(ステップ1604,NO)、判定部422は、lenとlenmaxとを比較する(
ステップ1605)。
First, the determination unit 422 sets the control variable j to 0, sets the variable len representing the length in the three-dimensional space to 0, and sets the variable ctr representing the number of line segments to 0 (step 1601). , J and nPts−1 are compared (step 1602). When j is less than nPts−1 (step 1602, YES), the determination unit 422 adds len j to len (step 1603) and compares θ j with θ max (step 1604). When θ j is less than θ max (step 1604, NO), the determination unit 422 compares len and len max (
Step 1605).
 lenmaxは、前腕の長さの上限値であり、例えば、非特許文献4に記載された前腕の
長さに基づいて決定することができる。また、作業者の身長が分かっている場合、レオナルド・ダ・ヴィンチのウィトルウィウス的人体図に基づいて、身長からlenmaxを決定
してもよい。
len max is an upper limit value of the length of the forearm, and can be determined based on the length of the forearm described in Non-Patent Document 4, for example. Further, when the height of the worker is known, len max may be determined from the height based on the Vitruvian human figure of Leonardo da Vinci.
 lenがlenmaxよりも大きい場合(ステップ1605,YES)、判定部422は
、線分情報434から配列番号j+2以降の点を削除して、nPtsをj+2に変更する(ステップ1606)。これにより、指又は物体に対応する線分が線分情報434から削除される。そして、判定部422は、ステップ1613~ステップ1620の法線判定処理を行う。
When len is larger than len max (step 1605, YES), the determination unit 422 deletes the points after the array element number j + 2 from the line segment information 434 and changes nPts to j + 2 (step 1606). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434. Then, the determination unit 422 performs normal line determination processing in steps 1613 to 1620.
 θjがθmax以上である場合(ステップ1604,YES)、判定部422は、配列番号j+1の点が折れ曲がりに該当しないと判定し、jを1だけインクリメントして(ステップ1611)、ステップ1602以降の処理を繰り返す。そして、jがnPts-1に達した場合(ステップ1602,NO)、判定部422は、ステップ1613~ステップ1620の法線判定処理を行う。 When θ j is equal to or larger than θ max (step 1604, YES), the determination unit 422 determines that the point of the array element number j + 1 does not correspond to bending, increments j by 1 (step 1611), and after step 1602 Repeat the process. If j reaches nPts−1 (step 1602, NO), the determination unit 422 performs normal line determination processing in steps 1613 to 1620.
 lenがlenmax以下である場合(ステップ1605,NO)、判定部422は、l
enとlenminとを比較する(ステップ1607)。lenがlenmin以上である場合(ステップ1607,NO)、判定部422は、lenに0を設定して(ステップ1612)、ステップ1611以降の処理を行う。
When len is less than or equal to len max (step 1605, NO), the determination unit 422 determines that l
en and len min are compared (step 1607). When len is equal to or greater than len min (step 1607, NO), the determination unit 422 sets len to 0 (step 1612), and performs the processing after step 1611.
 lenがlenmin未満である場合(ステップ1607,YES)、判定部422は、
ctrを1だけインクリメントして(ステップ1608)、ctrと2とを比較する(ステップ1609)。ctrが2以下である場合(ステップ1609,NO)、判定部422は、ステップ1612以降の処理を行う。
When len is less than len min (step 1607, YES), the determination unit 422
ctr is incremented by 1 (step 1608), and ctr is compared with 2 (step 1609). When ctr is 2 or less (step 1609, NO), the determination unit 422 performs the processing after step 1612.
 そして、ctrが2を超えた場合(ステップ1609,YES)、判定部422は、線分情報434から配列番号j+1以降の点を削除して、nPtsをj+1に変更する(ステップ1610)。これにより、指又は物体に対応する線分が線分情報434から削除される。そして、判定部422は、ステップ1613~ステップ1620の法線判定処理を行う。 If ctr exceeds 2 (step 1609, YES), the determination unit 422 deletes the points after the array element number j + 1 from the line segment information 434 and changes nPts to j + 1 (step 1610). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434. Then, the determination unit 422 performs normal line determination processing in steps 1613 to 1620.
 図17は、長さ判定処理の対象となる4本の線分の例を示している。lenj(j=0
~3)は、pts[j]及びpts[j+1]を両端とする線分の3次元空間における長さを表す。len0は上腕の一部分の長さに対応し、len1は前腕の長さに対応し、len2は手の長さに対応し、len3は手が把持している物体の長さに対応する。
FIG. 17 shows an example of four line segments to be subjected to length determination processing. len j (j = 0
(3) to (3) represent lengths in a three-dimensional space of line segments having both ends of pts [j] and pts [j + 1]. len 0 corresponds to the length of a part of the upper arm, len 1 corresponds to the length of the forearm, len 2 corresponds to the length of the hand, and len 3 corresponds to the length of the object held by the hand. To do.
 まず、j=0のとき、len=len0となる。しかし、θ0がθmax未満であり、le
nがlenmin以上かつlenmax以下であるため、lenが0にリセットされる。次に、j=1のとき、len=len1となる。このとき、θ1がθmax以上であるため、len
はリセットされない。
First, when j = 0, len = len 0 . However, θ 0 is less than θ max and le
Since n is greater than or equal to len min and less than or equal to len max , len is reset to 0. Next, when j = 1, len = len 1 is satisfied. At this time, since θ 1 is equal to or greater than θ max , len
Is not reset.
 そして、j=2のとき、len=len1+len2となる。このとき、θ2がθmax未満であり、lenがlenmaxを超えるため、線分情報434からpts[4]が削除され
、nPtsが5から4に変更される。これにより、物体に対応する線分が削除される。
When j = 2, len = len 1 + len 2 is satisfied. At this time, since θ 2 is less than θ max and len exceeds len max , pts [4] is deleted from the line segment information 434, and nPts is changed from 5 to 4. Thereby, the line segment corresponding to the object is deleted.
 図18は、長さ判定処理の対象となる3本の線分の例を示している。len0は前腕の
一部分の長さに対応し、len1は手の長さに対応し、len2は手が把持している物体の長さに対応する。
FIG. 18 shows an example of three line segments to be subjected to length determination processing. len 0 corresponds to the length of a part of the forearm, len 1 corresponds to the length of the hand, and len 2 corresponds to the length of the object held by the hand.
 まず、j=0のとき、len=len0となる。しかし、θ0がθmax未満であり、le
nがlenmin未満であるため、ctrが1に変更され、lenが0にリセットされる。
次に、j=1のとき、len=len1となる。しかし、θ1がθmax未満であり、len
がlenmin未満であるため、ctrが2に変更され、lenが0にリセットされる。
First, when j = 0, len = len 0 . However, θ 0 is less than θ max and le
Since n is less than len min , ctr is changed to 1 and len is reset to 0.
Next, when j = 1, len = len 1 is satisfied. However, θ 1 is less than θ max and len
Is less than len min , ctr is changed to 2 and len is reset to 0.
 そして、j=2のとき、len=len2となる。しかし、θ2がθmax未満であり、l
enがlenmin未満であるため、ctrが3に変更される。このとき、ctrが2を超
えるため、線分情報434からpts[3]が削除され、nPtsが4から3に変更される。これにより、物体に対応する線分が削除される。
When j = 2, len = len 2 is satisfied. However, θ 2 is less than θ max and l
Since en is less than len min , ctr is changed to 3. At this time, since ctr exceeds 2, pts [3] is deleted from the line segment information 434, and nPts is changed from 4 to 3. Thereby, the line segment corresponding to the object is deleted.
 ステップ1613~ステップ1620の法線判定処理では、線分情報434に残された点の配列が示す線分のうち、候補領域の下端までの距離が最も遠い線分上の複数の点それぞれを通る直線と、候補領域の輪郭との2つの交点が求められる。そして、それらの交点の間の距離に対応する3次元空間内の距離に基づいて、その線分から手領域を近似する部分が抽出される。候補領域の下端までの距離が最も遠い線分は、pts[nPts-2]及びpts[nPts-1]を両端とする線分であり、手に対応する線分であると推定される。 In the normal line determination processing in step 1613 to step 1620, among the line segments indicated by the arrangement of points remaining in the line segment information 434, each of the plurality of points on the line segment having the furthest distance to the lower end of the candidate area is passed. Two intersection points of the straight line and the outline of the candidate area are obtained. Then, based on the distance in the three-dimensional space corresponding to the distance between the intersections, a portion approximating the hand region is extracted from the line segment. The line segment with the longest distance to the lower end of the candidate area is a line segment having both ends of pts [nPts-2] and pts [nPts-1], and is estimated to be a line segment corresponding to the hand.
 まず、判定部422は、pts[nPts-2]及びpts[nPts-1]を両端とする線分Lを所定画素数の間隔で分割することで、線分L上のm個(mは2以上の整数)の点を求め、各点において線分Lと垂直に交わる法線を求める(ステップ1613)。そして、判定部422は、m本の法線それぞれと候補領域の輪郭との交点を求める。線分Lは候補領域内に存在し、候補領域の輪郭は線分Lの両側に存在するため、各法線と輪郭との2つの交点が求められる。 First, the determination unit 422 divides a line segment L having both ends of pts [nPts-2] and pts [nPts-1] at an interval of a predetermined number of pixels, so that m pieces (m is 2 on the line segment L). (Integer) is obtained, and a normal line perpendicular to the line segment L is obtained at each point (step 1613). Then, the determination unit 422 calculates an intersection between each of the m normals and the contour of the candidate area. Since the line segment L exists in the candidate area and the contour of the candidate area exists on both sides of the line segment L, two intersections between each normal line and the contour are obtained.
 次に、判定部422は、1本の法線を示す制御変数kに0を設定して(ステップ1614)、kとmとを比較する(ステップ1615)。k=0に対応する法線は、線分Lの下端で線分Lと交わっており、kが増加するに従って、対応する法線は下端から遠ざかっていく。 Next, the determination unit 422 sets 0 to a control variable k indicating one normal line (step 1614), and compares k and m (step 1615). The normal line corresponding to k = 0 intersects with the line segment L at the lower end of the line segment L, and the corresponding normal line moves away from the lower end as k increases.
 kがm未満である場合(ステップ1615,YES)、判定部422は、k番目の法線と輪郭との2つの交点の間の距離を求め、求めた距離に対応する3次元空間における長さn_lenkを計算する(ステップ1616)。そして、判定部422は、n_lenkとn_lenmaxとを比較する(ステップ1617)。n_lenmaxは、手の幅の上限値であり、例えば、非特許文献4に記載された手の幅に基づいて決定してもよく、ウィトルウィウス的人体図に基づいて決定してもよい。 When k is less than m (step 1615, YES), the determination unit 422 calculates the distance between the two intersections of the kth normal line and the contour, and the length in the three-dimensional space corresponding to the calculated distance. n_len k to calculate the (step 1616). Then, the determination unit 422 compares n_len k and n_len max (step 1617). n_len max is an upper limit value of the hand width, and may be determined based on the hand width described in Non-Patent Document 4, for example, or may be determined based on a Vitruvian human figure.
 n_lenkがn_lenmax以下である場合(ステップ1617,NO)、判定部422は、n_lenkとn_lenminとを比較する(ステップ1618)。n_lenmin
は、手の幅の下限値であり、例えば、非特許文献4に記載された手の幅に基づいて決定してもよく、ウィトルウィウス的人体図に基づいて決定してもよい。
When n_len k is equal to or smaller than n_len max (step 1617, NO), the determination unit 422 compares n_len k and n_len min (step 1618). n_len min
Is a lower limit value of the hand width, and may be determined based on the hand width described in Non-Patent Document 4, for example, or may be determined based on a Vitruvian human figure.
 n_lenkがn_lenmin以上である場合(ステップ1618,NO)、判定部422は、kを1だけインクリメントして(ステップ1620)、ステップ1615以降の処理を繰り返す。そして、kがmに達した場合(ステップ1615,NO)、判定部422は、処理を終了する。 When n_len k is greater than or equal to n_len min (step 1618, NO), the determination unit 422 increments k by 1 (step 1620), and repeats the processing after step 1615. When k reaches m (step 1615, NO), the determination unit 422 ends the process.
 n_lenkがn_lenmaxよりも大きい場合(ステップ1617,YES)、判定部422は、k番目の法線と線分Lとの交点を求め、求めた交点をpts[nPts-1]´として記録する(ステップ1619)。これにより、線分L上の点pts[nPts-2]から点pts[nPts-1]´までの部分が、手領域を近似する部分として抽出される。そして、判定部422は、pts[nPts-2]及びpts[nPts-1]´のx座標、y座標、及び距離値Zを含む手領域線分情報435を生成する。 When n_len k is larger than n_len max (step 1617, YES), the determination unit 422 obtains an intersection between the k-th normal and the line segment L, and records the obtained intersection as pts [nPts−1] ′. (Step 1619). As a result, a portion from the point pts [nPts-2] to the point pts [nPts-1] ′ on the line segment L is extracted as a portion approximating the hand region. Then, the determination unit 422 generates hand region line segment information 435 including the x coordinate, the y coordinate, and the distance value Z of pts [nPts-2] and pts [nPts-1] ′.
 n_lenkがn_lenmin未満である場合も(ステップ1618,YES)、判定部422は、ステップ1619の処理を行う。 also N_len k is less than n_len min (step 1618, YES), the determination unit 422 performs the process of step 1619.
 図19は、法線判定処理の対象となる線分の例を示している。この例では、pts[2]及びpts[3]を両端とする線分が所定間隔で分割されて、5本の法線が生成されている。したがって、m=5である。 FIG. 19 shows an example of a line segment to be subjected to normal line determination processing. In this example, line segments having both ends of pts [2] and pts [3] are divided at a predetermined interval to generate five normal lines. Therefore, m = 5.
 まず、k=0のとき、n_len0がn_lenmin以上かつn_lenmax以下である
ため、kがインクリメントされる。次に、k=1のとき、n_len1がn_lenmin以上かつn_lenmax以下であるため、kがインクリメントされる。次に、k=2のとき
、n_len2がn_lenmin以上かつn_lenmax以下であるため、kがインクリメ
ントされる。
First, when k = 0, since n_len 0 is not less than n_len min and not more than n_len max , k is incremented. Next, when k = 1, since n_len 1 is not less than n_len min and not more than n_len max , k is incremented. Next, when k = 2, since n_len 2 is n_len min or more and n_len max or less, k is incremented.
 そして、k=3のとき、n_len3がn_lenmaxを超えるため、線分L上の交点pts[3]´が求められる。これにより、点pts[2]から点pts[3]´までの部分が、手領域を近似する部分として抽出される。 When k = 3, since n_len 3 exceeds n_len max , the intersection point pts [3] ′ on the line segment L is obtained. As a result, a portion from the point pts [2] to the point pts [3] ′ is extracted as a portion approximating the hand region.
 図16の長さ判定処理によれば、折れ曲がり判定処理によって削除されなかった、物体に対応する線分を削除することができ、手領域を近似する線分が特定される。そして、特定した線分上で、手領域を近似する部分を特定することができる。 According to the length determination process in FIG. 16, a line segment corresponding to an object that has not been deleted by the bending determination process can be deleted, and a line segment approximating the hand region is specified. And the part which approximates a hand area on the specified line segment can be specified.
 図20は、図10のステップ1007における位置特定処理の例を示すフローチャートである。位置特定処理では、手領域線分情報435が示す線分に対応する手領域において、候補領域の輪郭に内接する最大内接円を求めることで、手のひらの中心位置が特定される。 FIG. 20 is a flowchart showing an example of the position specifying process in step 1007 of FIG. In the position specifying process, the center position of the palm is specified by obtaining the maximum inscribed circle inscribed in the outline of the candidate area in the hand area corresponding to the line segment indicated by the hand area line segment information 435.
 手のひらの幅は閉じた指の幅よりも広いため、手のひらの中心における内接円は、閉じた指の領域における内接円よりも大きくなると考えられる。また、手のひらの幅は手首の幅よりも広いため、手のひらの中心における内接円は、手首の領域における内接円よりも大きくなると考えられる。したがって、手領域における最大内接円の中心を、手のひらの中心とみなすことができる。 Since the width of the palm is wider than the width of the closed finger, the inscribed circle at the center of the palm is considered to be larger than the inscribed circle in the closed finger area. Further, since the width of the palm is wider than the width of the wrist, the inscribed circle at the center of the palm is considered to be larger than the inscribed circle in the region of the wrist. Therefore, the center of the maximum inscribed circle in the hand region can be regarded as the center of the palm.
 まず、特定部213は、点pts[nPts-2]と点pts[nPts-1]´の間
に走査範囲を設定し、走査範囲内の各走査点を中心として候補領域の輪郭に内接する内接円を求める(ステップ2001)。そして、特定部213は、複数の走査点それぞれを中心とする内接円のうち、最大の内接円の中心の座標(xp,yp)を求める。走査範囲は、点pts[nPts-2]及び点pts[nPts-1]´を両端とする線分であってもよく、その線分をx方向及びy方向に所定画素数だけ広げた領域であってもよい。
First, the specifying unit 213 sets a scanning range between the points pts [nPts-2] and the points pts [nPts-1] ′, and is inscribed in the outline of the candidate area around each scanning point in the scanning range. A tangent circle is obtained (step 2001). Then, the specifying unit 213 obtains the coordinates (xp, yp) of the center of the largest inscribed circle among the inscribed circles centered on each of the plurality of scanning points. The scanning range may be a line segment having points pts [nPts-2] and pts [nPts-1] ′ as both ends, and is an area in which the line segment is expanded by a predetermined number of pixels in the x direction and the y direction. There may be.
 例えば、特定部213は、各走査点から候補領域の輪郭までの距離の最小値dを求め、最小値dが最大となる走査点の座標(xmax,ymax)を求める。そして、特定部213は、座標(xmax,ymax)を、最大内接円の中心の座標(xp,yp)として記録する。この場合、走査点(xmax,ymax)における距離の最小値dが、最大内接円の半径となる。 For example, the specifying unit 213 obtains the minimum value d of the distance from each scanning point to the contour of the candidate area, and obtains the coordinates (xmax, ymax) of the scanning point at which the minimum value d is maximum. Then, the specifying unit 213 records the coordinates (xmax, ymax) as the coordinates (xp, yp) of the center of the maximum inscribed circle. In this case, the minimum distance d at the scanning point (xmax, ymax) is the radius of the maximum inscribed circle.
 図21は、最大内接円の例を示している。この例では、点pts[nPts-2]及び点pts[nPts-1]´を両端とする線分2101が走査範囲に設定され、線分2101上の点2102を中心とする最大内接円2103が求められる。 FIG. 21 shows an example of the maximum inscribed circle. In this example, a line segment 2101 having both ends of the points pts [nPts-2] and pts [nPts-1] ′ is set as the scanning range, and the maximum inscribed circle 2103 centering on the point 2102 on the line segment 2101 is set. Is required.
 次に、特定部213は、最大内接円の中心に対応する3次元空間内の点を求め、求めた点の位置を示す位置情報436を生成する(ステップ2002)。例えば、特定部213は、最大内接円の中心の座標(xp,yp)及び距離値Zpを用いて、ピンホールカメラモデルにより3次元空間内の点の座標(Xp,Yp,Zp)を求めることができる。この場合、Xp及びYpは、次式により計算される。 Next, the specifying unit 213 obtains a point in the three-dimensional space corresponding to the center of the maximum inscribed circle, and generates position information 436 indicating the position of the obtained point (step 2002). For example, the specifying unit 213 obtains the coordinates (Xp, Yp, Zp) of the point in the three-dimensional space using the pinhole camera model, using the coordinates (xp, yp) of the center of the maximum inscribed circle and the distance value Zp. be able to. In this case, Xp and Yp are calculated by the following equations.
Xp=(Zp×(xp-cx))/fx   (11)
Yp=(Zp×(yp-cy))/fy   (12)
Xp = (Zp × (xp−cx)) / fx (11)
Yp = (Zp × (yp−cy)) / fy (12)
 図22は、図2の画像処理装置201の第2の具体例を示している。図22の画像処理装置201は、図4の画像処理装置201に検出部2201及び上肢検出部2202を追加した構成を有し、距離画像432から手の位置及び上肢の位置を求める。 FIG. 22 shows a second specific example of the image processing apparatus 201 of FIG. The image processing apparatus 201 in FIG. 22 has a configuration in which a detection unit 2201 and an upper limb detection unit 2202 are added to the image processing apparatus 201 in FIG. 4, and obtains the position of the hand and the upper limb from the distance image 432.
 図23は、図22の撮像装置401の視野範囲の例を示している。図23の撮像装置401は、作業台102の上方に設置されており、その視野範囲2301には、作業台102、作業者101の左手502及び右手503を含む上半身2302、及び右手503が把持している物体2303が含まれている。したがって、撮像装置401が撮影する距離画像432には、作業台102、上半身2302、及び物体2303が写っている。 FIG. 23 shows an example of the visual field range of the imaging device 401 of FIG. The imaging apparatus 401 in FIG. 23 is installed above the workbench 102, and the visual field range 2301 is gripped by the workbench 102, the upper body 2302 including the left hand 502 and the right hand 503 of the worker 101, and the right hand 503. The object 2303 is included. Therefore, the work table 102, the upper body 2302, and the object 2303 are shown in the distance image 432 captured by the imaging device 401.
 XW軸、YW軸、及びZW軸は世界座標系を表し、その原点Oは作業場の床面に設けられている。XW軸は、作業台102の長辺と平行に設けられ、作業者101の左肩から右肩へ向かう方向を表す。YW軸は、作業台102の短辺と平行に設けられ、作業者101の正面から背面へ向かう方向を表す。ZW軸は、床面に垂直に設けられ、床面から作業者101の頭頂へ向かう方向を表す。 The XW axis, YW axis, and ZW axis represent the world coordinate system, and the origin O is provided on the floor of the work place. The XW axis is provided in parallel with the long side of the work table 102 and represents the direction from the left shoulder to the right shoulder of the worker 101. The YW axis is provided in parallel with the short side of the work table 102 and represents the direction from the front to the back of the worker 101. The ZW axis is provided perpendicular to the floor surface and represents a direction from the floor surface toward the top of the worker 101.
 検出部2201は、背景距離画像431を用いて距離画像432から体領域を検出し、検出した体領域を示す領域情報2211を記憶部412に格納する。体領域は、作業者101の上半身2302が写っていると推定される領域である。 The detection unit 2201 detects a body region from the distance image 432 using the background distance image 431 and stores region information 2211 indicating the detected body region in the storage unit 412. The body region is a region where the upper body 2302 of the worker 101 is estimated to be captured.
 上肢検出部2202は、距離画像432、領域情報2211、及び手のひらの中心位置を用いて、手首、肘、又は肩を含む上肢の位置を特定し、特定した位置を示す位置情報2212を記憶部412に格納する。出力部411は、位置情報436及び位置情報2212に基づく認識結果を出力する。認識結果は、手及び上肢の3次元位置の変化を示す軌跡であってもよく、手及び上肢の軌跡から推定される作業者の動作を示す情報であってもよ
い。
The upper limb detection unit 2202 identifies the position of the upper limb including the wrist, elbow, or shoulder using the distance image 432, the region information 2211, and the center position of the palm, and the position information 2212 indicating the identified position is stored in the storage unit 412. To store. The output unit 411 outputs a recognition result based on the position information 436 and the position information 2212. The recognition result may be a trajectory indicating a change in the three-dimensional position of the hand and the upper limb, or may be information indicating an operator's motion estimated from the trajectory of the hand and the upper limb.
 図22の画像処理装置201によれば、手の動きだけでなく上肢の動きも考慮して、作業者の動作を認識することができ、作業者の動作の認識精度がさらに向上する。 According to the image processing apparatus 201 in FIG. 22, it is possible to recognize the movement of the worker in consideration of not only the movement of the hand but also the movement of the upper limb, and the recognition accuracy of the movement of the worker is further improved.
 図24は、図22の画像処理装置201が行う画像処理の具体例を示すフローチャートである。この画像処理では、距離画像432が手検出領域及び体検出領域の2つの領域に分割される。 FIG. 24 is a flowchart illustrating a specific example of image processing performed by the image processing apparatus 201 in FIG. In this image processing, the distance image 432 is divided into two areas, a hand detection area and a body detection area.
 図25は、図23の視野範囲2301を撮影した距離画像432及び背景距離画像431の例を示している。図25(a)は、距離画像432の例を示しており、図25(b)は、背景距離画像431の例を示している。距離画像432は、手検出領域2501及び体検出領域2502に分割されており、背景距離画像431も同様に2つの領域に分割されている。 FIG. 25 shows an example of a distance image 432 and a background distance image 431 obtained by photographing the visual field range 2301 of FIG. FIG. 25A shows an example of the distance image 432, and FIG. 25B shows an example of the background distance image 431. The distance image 432 is divided into a hand detection area 2501 and a body detection area 2502, and the background distance image 431 is similarly divided into two areas.
 まず、検出部2201は、距離画像432の体検出領域を用いて体領域検出処理を行い(ステップ2401)、検出部211は、距離画像432の手検出領域を用いて候補領域検出処理を行う(ステップ2402)。次に、抽出部212及び特定部213は、手位置検出処理を行い(ステップ2403)、上肢検出部2202は、上肢位置検出処理を行う(ステップ2404)。 First, the detection unit 2201 performs body region detection processing using the body detection region of the distance image 432 (step 2401), and the detection unit 211 performs candidate region detection processing using the hand detection region of the distance image 432 (step 2401). Step 2402). Next, the extraction unit 212 and the specifying unit 213 perform hand position detection processing (step 2403), and the upper limb detection unit 2202 performs upper limb position detection processing (step 2404).
 図26は、図24のステップ2401における体領域検出処理の例を示すフローチャートである。まず、検出部2201は、距離画像432の体検出領域における各画素の画素値から、背景距離画像431の体検出領域における各画素の画素値を減算して、差分画像を生成し、生成した差分画像を二値化する(ステップ2601)。 FIG. 26 is a flowchart showing an example of body region detection processing in step 2401 of FIG. First, the detection unit 2201 subtracts the pixel value of each pixel in the body detection area of the background distance image 431 from the pixel value of each pixel in the body detection area of the distance image 432, generates a difference image, and generates the generated difference. The image is binarized (step 2601).
 次に、検出部2201は、二値画像の各画素に対してオープニング及びクロージングを行う(ステップ2602)。そして、検出部2201は、二値画像に含まれる白領域のうち、最大の面積を有する白領域を体領域として抽出し(ステップ2603)、その体領域の重心を求める(ステップ2604)。例えば、検出部2201は、体領域に含まれる複数の画素それぞれのx座標及びy座標の平均値を計算することで、体領域の重心の座標を求めることができる。 Next, the detection unit 2201 performs opening and closing for each pixel of the binary image (step 2602). Then, the detection unit 2201 extracts, as a body region, a white region having the maximum area among the white regions included in the binary image (step 2603), and obtains the center of gravity of the body region (step 2604). For example, the detection unit 2201 can obtain the coordinates of the center of gravity of the body region by calculating the average value of the x and y coordinates of each of the plurality of pixels included in the body region.
 次に、検出部2201は、体領域に写っている頭の位置を求める(ステップ2505)。例えば、検出部2201は、体領域に含まれる複数の画素それぞれの距離値のヒストグラムを生成し、生成したヒストグラムから、閾値THD以下の距離値を有する画素の個数が所定数以上になるような閾値THDを決定する。次に、検出部2201は、閾値THD以上の距離値を有する画素から成る領域のうち、最大の領域を頭領域として選択し、頭領域の重心の座標を頭の位置として求める。そして、検出部2201は、体領域及び頭領域の重心の座標を示す領域情報2211を生成する。 Next, the detection unit 2201 obtains the position of the head shown in the body region (step 2505). For example, the detection unit 2201 generates a histogram of distance values for each of a plurality of pixels included in the body region, and a threshold value such that the number of pixels having a distance value equal to or less than the threshold value THD is greater than or equal to a predetermined number from the generated histogram. Determine the THD. Next, the detection unit 2201 selects the maximum region as a head region among regions composed of pixels having a distance value equal to or greater than the threshold value THD, and obtains the coordinates of the center of gravity of the head region as the head position. Then, the detection unit 2201 generates region information 2211 indicating the coordinates of the center of gravity of the body region and the head region.
 図24のステップ2402において、検出部211は、距離画像432及び背景距離画像431の手検出領域を用いて、図7と同様の候補領域検出処理を行う。 24, the detection unit 211 performs a candidate area detection process similar to that of FIG. 7 using the hand detection areas of the distance image 432 and the background distance image 431.
 図27は、図24のステップ2403における手位置検出処理の例を示すフローチャートである。図27のステップ2701~ステップ2706、ステップ2708、及びステップ2709の処理は、図10のステップ1001~ステップ1008の処理と同様である。判定部422は、長さ判定処理を行った後、3次元方向判定処理を行う(ステップ2707)。 FIG. 27 is a flowchart showing an example of the hand position detection process in step 2403 of FIG. The processing in Step 2701 to Step 2706, Step 2708, and Step 2709 in FIG. 27 is the same as the processing in Step 1001 to Step 1008 in FIG. The determination unit 422 performs a three-dimensional direction determination process after performing the length determination process (step 2707).
 図28は、図27のステップ2707における3次元方向判定処理の例を示すフローチャートである。3次元方向判定処理では、各線分の3次元空間における方向と、その線分を含む領域に写っている被写体の3次元空間における方向とを比較した結果に基づいて、手よりも先に存在すると推定される線分が除外される。 FIG. 28 is a flowchart showing an example of the three-dimensional direction determination process in step 2707 of FIG. In the three-dimensional direction determination process, based on the result of comparing the direction in the three-dimensional space of each line segment and the direction in the three-dimensional space of the subject reflected in the area including the line segment, Presumed line segments are excluded.
 まず、判定部422は、制御変数jに0を設定して(ステップ2801)、jとnPts-1とを比較する(ステップ2802)。jがnPts-1未満である場合(ステップ2802,YES)、判定部422は、pts[j]の3次元空間における座標(Xj
j,Zj)と、pts[j+1]の3次元空間における座標(Xj+1,Yj+1,Zj+1)と
を求める(ステップ2803)。例えば、判定部422は、ピンホールカメラモデルにより(Xj,Yj,Zj)及び(Xj+1,Yj+1,Zj+1)を求めることができる。
First, the determination unit 422 sets 0 to the control variable j (step 2801), and compares j with nPts−1 (step 2802). When j is less than nPts−1 (step 2802, YES), the determination unit 422 determines the coordinates (X j ,
Y j , Z j ) and coordinates (X j + 1 , Y j + 1 , Z j + 1 ) in the three-dimensional space of pts [j + 1] are obtained (step 2803). For example, the determination unit 422 can obtain (X j , Y j , Z j ) and (X j + 1 , Y j + 1 , Z j + 1 ) using a pinhole camera model.
 次に、判定部422は、(Xj,Yj,Zj)及び(Xj+1,Yj+1,Zj+1)を用いて、pts[j]及びpts[j+1]を両端とする線分Ljの3次元空間における方向を示す
方向ベクトルVjを求める(ステップ2804)。例えば、判定部422は、3次元空間
において(Xj,Yj,Zj)から(Xj+1,Yj+1,Zj+1)へ向かうベクトルを、Vjとし
て求めることができる。判定部422は、線分Lj上の複数の点それぞれの3次元空間に
おける座標を求め、それらの座標を結ぶ曲線を近似するベクトルを、Vjとして求めても
よい。
Next, the determination unit 422 uses (X j , Y j , Z j ) and (X j + 1 , Y j + 1 , Z j + 1 ) to convert pts [j] and pts [j + 1] to both ends. A direction vector V j indicating the direction of the line segment L j in the three-dimensional space is obtained (step 2804). For example, the determination unit 422 can determine a vector from (X j , Y j , Z j ) to (X j + 1 , Y j + 1 , Z j + 1 ) in the three-dimensional space as V j. . The determination unit 422 may obtain the coordinates in the three-dimensional space of each of a plurality of points on the line segment L j and obtain a vector that approximates a curve connecting these coordinates as V j .
 次に、判定部422は、候補領域内で線分Ljの周辺領域を設定し、周辺領域内の複数
の点それぞれの3次元空間における座標を求め、それらの座標に対する主成分分析を行う(ステップ2805)。そして、判定部422は、主成分分析の結果から第1主成分のベクトルEVjを求める(ステップ2806)。EVjは、線分Ljを含む領域に写っている
被写体の3次元空間における方向を示している。
Next, the determination unit 422 sets a peripheral region of the line segment L j in the candidate region, obtains coordinates in a three-dimensional space of each of a plurality of points in the peripheral region, and performs principal component analysis on these coordinates ( Step 2805). Then, the determination unit 422 obtains the first principal component vector EV j from the result of the principal component analysis (step 2806). EV j indicates the direction in the three-dimensional space of the subject shown in the area including the line segment L j .
 図29は、j=3の場合の周辺領域の例を示している。この場合、線分L3は、pts
[3]及びpts[4]を両端とする線分である。判定部422は、例えば、pts[3]を通る法線とpts[4]を通る法線を求め、それらの2本の法線と候補領域の輪郭によって囲まれた領域2901を、線分L3の周辺領域として設定することができる。
FIG. 29 shows an example of the peripheral region in the case of j = 3. In this case, the line segment L 3 is pts.
[3] and pts [4] are both line segments. For example, the determination unit 422 obtains a normal line passing through pts [3] and a normal line passing through pts [4], and an area 2901 surrounded by the outlines of the two normal lines and the candidate area is obtained as a line segment L It can be set as 3 peripheral areas.
 次に、判定部422は、VjとEVjとの間の角度γjを求め(ステップ2807)、γjを閾値γthと比較する(ステップ2808)。γjがγth未満である場合(ステップ28
08,NO)、判定部422は、jを1だけインクリメントして(ステップ2810)、ステップ2802以降の処理を繰り返す。そして、jがnPts-1に達した場合(ステップ2802,NO)、判定部422は、処理を終了する。
Next, the determination unit 422 obtains an angle γ j between V j and EV j (step 2807), and compares γ j with a threshold γ th (step 2808). When γ j is less than γ th (step 28)
(08, NO), the determination unit 422 increments j by 1 (step 2810), and repeats the processing after step 2802. If j reaches nPts−1 (step 2802, NO), the determination unit 422 ends the process.
 γjがγthよりも大きい場合(ステップ2808,YES)、判定部422は、線分情
報434から配列番号j+1以降の点を削除して、nPtsをj+1に変更する(ステップ2809)。これにより、指又は物体に対応する線分が線分情報434から削除される。
When γ j is larger than γ th (step 2808, YES), the determination unit 422 deletes the points after the array element number j + 1 from the line segment information 434 and changes nPts to j + 1 (step 2809). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434.
 このような3次元方向判定処理によれば、折れ曲がり判定処理及び長さ判定処理によっても削除されなかった、物体に対応する線分を削除することができる。 According to such a three-dimensional direction determination process, it is possible to delete a line segment corresponding to the object that has not been deleted by the bending determination process and the length determination process.
 図30は、第1主成分のベクトルEV0~EV3の例を示している。EV0~EV2は、線分L0~L2それぞれの周辺領域に対する主成分分析により得られ、上腕、前腕、及び手の領域にそれぞれ対応している。この場合、EV0~EV1それぞれの方向は、L0~L2の方向ベクトルV0~V2それぞれの方向に近くなる。 FIG. 30 shows an example of the first principal component vectors EV 0 to EV 3 . EV 0 to EV 2 are obtained by principal component analysis on the peripheral regions of the line segments L 0 to L 2 , and correspond to the upper arm, forearm, and hand regions, respectively. In this case, the directions of EV 0 to EV 1 are close to the directions of the direction vectors V 0 to V 2 of L 0 to L 2 .
 一方、手が把持している物体3001の領域に対応するEV3の方向は、線分L3の方向ベクトルV3の方向と大きく異なるため、V3とEV3との間の角度γ3がγthよりも大きくなる。したがって、線分情報434からpts[4]が削除され、nPtsが5から4に変更される。これにより、物体3001に対応する線分が削除される。 On the other hand, the direction of EV 3 corresponding to the region of the object 3001 hand is holding is different significantly from the direction of the direction vector V 3 of the line segment L 3, the angle gamma 3 between V 3 and EV 3 It becomes larger than γ th . Therefore, pts [4] is deleted from the line segment information 434, and nPts is changed from 5 to 4. Thereby, the line segment corresponding to the object 3001 is deleted.
 図31は、図24のステップ2404における上肢位置検出処理の例を示すフローチャートである。まず、上肢検出部2202は、領域情報2211が示す頭領域の重心の世界座標系における位置を求める(ステップ3101)。例えば、上肢検出部2202は、頭領域の重心の座標と距離値とを用いて、ピンホールカメラモデルにより3次元空間における座標を求めることができる。 FIG. 31 is a flowchart showing an example of the upper limb position detection process in step 2404 of FIG. First, the upper limb detection unit 2202 obtains the position of the center of gravity of the head region indicated by the region information 2211 in the world coordinate system (step 3101). For example, the upper limb detection unit 2202 can obtain coordinates in a three-dimensional space using a pinhole camera model using the coordinates of the center of gravity of the head region and the distance value.
 そして、上肢検出部2202は、求めた座標を、撮像装置401の外部パラメタに基づくRT行列を用いて、図23に示した世界座標系の座標(XWH,YWH,ZWH)に変換する。撮像装置401の外部パラメタは、床面から撮像装置401の設置位置までの高さと、撮像装置401の傾き角度とを含み、RT行列は、撮像装置401を原点とする3次元座標系と世界座標系との間の回転及び並進を表す。得られたZWHは、床面から頭頂までの高さを表し、作業者のおおよその身長に対応する。 Then, the upper limb detection unit 2202 converts the obtained coordinates into coordinates (XWH, YWH, ZWH) in the world coordinate system illustrated in FIG. 23 using an RT matrix based on the external parameters of the imaging device 401. The external parameters of the imaging device 401 include the height from the floor surface to the installation position of the imaging device 401 and the tilt angle of the imaging device 401, and the RT matrix includes the three-dimensional coordinate system and the world coordinates with the imaging device 401 as the origin. Represents rotation and translation between systems. The obtained ZWH represents the height from the floor surface to the top of the head, and corresponds to the approximate height of the operator.
 次に、上肢検出部2202は、領域情報2211が示す体領域の重心の3次元空間における座標を求め、求めた座標をRT行列を用いて世界座標系の座標(XWB,YWB,ZWB)に変換する(ステップ3102)。 Next, the upper limb detection unit 2202 obtains coordinates in the three-dimensional space of the center of gravity of the body area indicated by the area information 2211 and converts the obtained coordinates into coordinates (XWB, YWB, ZWB) in the world coordinate system using the RT matrix. (Step 3102).
 次に、上肢検出部2202は、ZWHを身長として用いて、世界座標系における両肩のおおよその位置を求める(ステップ3103)。例えば、上肢検出部2202は、ウィトルウィウス的人体図に基づいて、身長に対する両肩の高さの比率と、身長に対する肩幅の比率とを決定し、ZWHから左肩の高さ(ZW座標)ZWLS、右肩の高さZWRS、及び肩幅SWを求めることができる。そして、上肢検出部2202は、XWB-SW/2を左肩の横位置(XW座標)XWLSとして求め、XWB+SW/2を右肩の横位置XWRSとして求め、YWBを左肩及び右肩のYW座標に決定する。 Next, the upper limb detection unit 2202 obtains the approximate position of both shoulders in the world coordinate system using ZWH as the height (step 3103). For example, the upper limb detection unit 2202 determines the ratio of the height of both shoulders to the height and the ratio of the shoulder width to the height based on the Vitruvian human figure, and the height of the left shoulder (ZW coordinates) ZWLS, right The shoulder height ZWRS and the shoulder width SW can be obtained. Then, the upper limb detection unit 2202 obtains XWB−SW / 2 as the lateral position (XW coordinate) XWLS of the left shoulder, obtains XWB + SW / 2 as the lateral position XWRS of the right shoulder, and determines YWB as the YW coordinates of the left shoulder and the right shoulder. To do.
 なお、上肢検出部2202は、XWB及びYWBの代わりに、XWH及びYWHを用いてもよい。 The upper limb detection unit 2202 may use XWH and YWH instead of XWB and YWB.
 次に、上肢検出部2202は、i番目の候補領域を示す制御変数iに0を設定し(ステップ3104)、iとnHandsとを比較する(ステップ3105)。iがnHands未満である場合(ステップ3105,YES)、上肢検出部2202は、i番目の候補領域において、特定部213が求めた手のひらの中心位置に基づいて、手首の位置を求める(ステップ3106)。そして、上肢検出部2202は、手首の位置の座標をRT行列を用いて世界座標系の座標に変換する。 Next, the upper limb detection unit 2202 sets 0 to the control variable i indicating the i-th candidate region (step 3104), and compares i with nHands (step 3105). When i is less than nHands (step 3105, YES), the upper limb detection unit 2202 obtains the wrist position based on the palm center position obtained by the specifying unit 213 in the i-th candidate region (step 3106). . Then, the upper limb detection unit 2202 converts the coordinates of the wrist position into the coordinates of the world coordinate system using the RT matrix.
 例えば、上肢検出部2202は、線分情報434が示すpts[0]~pts[nPts-1]のうち、手のひらの中心位置よりも候補領域の下端に近い側(体側)に存在し、手のひらの中心位置に最も近い点を、手首の位置に決定することができる。上肢検出部2202は、手のひらの中心位置から所定距離だけ離れた位置に存在する線分上の点を、手首の位置に決定してもよい。 For example, the upper limb detection unit 2202 is present on the side closer to the lower end (body side) of the candidate region than the center position of the palm among pts [0] to pts [nPts-1] indicated by the line segment information 434, and The point closest to the center position can be determined as the wrist position. The upper limb detection unit 2202 may determine a point on the line segment that exists at a position away from the center position of the palm by a predetermined distance as the position of the wrist.
 図32は、手首の位置の例を示している。この例では、手のひらの中心位置3201よりも候補領域の下端に近い側に存在するpts[0]~pts[2]のうち、手のひらの中心位置3201に最も近いpts[2]が、手首の位置に決定される。 FIG. 32 shows an example of the position of the wrist. In this example, among pts [0] to pts [2] existing closer to the lower end of the candidate region than the palm center position 3201, pts [2] closest to the palm center position 3201 is the wrist position. To be determined.
 次に、上肢検出部2202は、i番目の候補領域において、手首の位置に基づいて肘の位置を求め、肘の位置の座標をRT行列を用いて世界座標系の座標に変換する(ステップ3107)。 Next, the upper limb detection unit 2202 obtains the elbow position based on the wrist position in the i-th candidate region, and converts the elbow position coordinates into coordinates in the world coordinate system using the RT matrix (step 3107). ).
 例えば、上肢検出部2202は、ウィトルウィウス的人体図に基づいて、身長に対する前腕の長さの比率を決定し、ZWHから前腕の長さを求め、その長さを候補領域における画像上の前腕の長さに変換することができる。そして、上肢検出部2202は、手首の位置から候補領域の下端に近い側に向かって、画像上の前腕の長さだけ離れた点を求め、pts[0]~pts[nPts-1]のうち、求めた点から所定誤差の範囲内に存在する点を、肘の位置に決定する。例えば、図32の例では、pts[1]が肘の位置に決定される。 For example, the upper limb detection unit 2202 determines the ratio of the forearm length to the height based on the Vitruvian human figure, obtains the forearm length from the ZWH, and calculates the length of the forearm on the image in the candidate region. Can be converted. The upper limb detection unit 2202 obtains a point separated from the wrist position by the length of the forearm on the image toward the side closer to the lower end of the candidate region, and among the pts [0] to pts [nPts−1] Then, a point existing within a predetermined error range from the obtained point is determined as the elbow position. For example, in the example of FIG. 32, pts [1] is determined as the elbow position.
 所定誤差の範囲内に線分情報434が示す点が存在しない場合、上肢検出部2202は、手首の位置を中心とし、画像上の前腕の長さを半径とする円と、線分情報434が示す線分との交点を求め、求めた交点を肘の位置に決定してもよい。距離画像432の手検出領域に肘が写っておらず、円と線分との交点が存在しない場合、上肢検出部2202は、手首の位置とpts[0]とを結ぶ線分の延長線上で、手首の位置から画像上の前腕の長さだけ離れた点を、肘の位置に決定する。 When the point indicated by the line segment information 434 does not exist within the predetermined error range, the upper limb detection unit 2202 includes the circle centered on the wrist position and the forearm length on the image as a radius, and the line segment information 434 An intersection with the line segment to be shown may be obtained, and the obtained intersection may be determined as the elbow position. When the elbow is not shown in the hand detection area of the distance image 432 and there is no intersection between the circle and the line segment, the upper limb detection unit 2202 is on the extension line of the line segment connecting the wrist position and pts [0]. A point separated from the wrist position by the length of the forearm on the image is determined as the elbow position.
 次に、上肢検出部2202は、世界座標系における肘の位置に基づいて、肩の位置を補正する(ステップ3108)。例えば、上肢検出部2202は、ウィトルウィウス的人体図に基づいて、身長に対する上腕の長さの比率を決定し、ZWHから上腕の長さを求めることができる。そして、上肢検出部2202は、世界座標系における肘の座標と肩の座標との3次元距離UALenを求める。 Next, the upper limb detection unit 2202 corrects the position of the shoulder based on the position of the elbow in the world coordinate system (step 3108). For example, the upper limb detection unit 2202 can determine the ratio of the length of the upper arm to the height based on the Vitruvian human figure, and can determine the length of the upper arm from the ZWH. Then, the upper limb detection unit 2202 obtains a three-dimensional distance UALen between the elbow coordinates and the shoulder coordinates in the world coordinate system.
 UALenが上腕の長さを超える場合、上肢検出部2202は、UALenが上腕の長さに一致するように、肘の座標と肩の座標とを結ぶ3次元直線上で、肩の位置を移動させる。一方、UALenが上腕の長さ以下である場合、上肢検出部2202は、肩の座標を補正しない。そして、上肢検出部2202は、世界座標系における手首、肘、及び肩の位置を示す位置情報2212を生成する。 When UALen exceeds the length of the upper arm, the upper limb detection unit 2202 moves the position of the shoulder on the three-dimensional straight line connecting the elbow coordinates and the shoulder coordinates so that UAlen matches the length of the upper arm. . On the other hand, when UALen is equal to or less than the length of the upper arm, the upper limb detection unit 2202 does not correct the coordinates of the shoulder. Then, the upper limb detection unit 2202 generates position information 2212 indicating the positions of the wrist, elbow, and shoulder in the world coordinate system.
 次に、上肢検出部2202は、iを1だけインクリメントして(ステップ3109)、ステップ3105以降の処理を繰り返す。そして、iがnHandsに達した場合(ステップ3105,NO)、上肢検出部2202は、処理を終了する。 Next, the upper limb detection unit 2202 increments i by 1 (step 3109), and repeats the processing after step 3105. When i reaches nHands (step 3105, NO), the upper limb detection unit 2202 ends the process.
 ステップ3103、ステップ3107、及びステップ3108において、上肢検出部2202は、ZWHの代わりに、事前に設定した所定値を身長として用いてもよい。 In Step 3103, Step 3107, and Step 3108, the upper limb detection unit 2202 may use a predetermined value set in advance as the height instead of ZWH.
 図33は、図4又は図22の画像処理装置201を含む画像処理システムの機能的構成例を示している。図33の画像処理システムは、画像処理装置201及び画像処理装置3301を含む。 FIG. 33 shows a functional configuration example of an image processing system including the image processing apparatus 201 of FIG. 4 or FIG. The image processing system in FIG. 33 includes an image processing device 201 and an image processing device 3301.
 画像処理装置201内の送信部3311は、図4又は図22の出力部411に対応し、位置情報436に基づく認識結果、又は位置情報436及び位置情報2212に基づく認識結果を、通信ネットワークを介して画像処理装置3301へ送信する。例えば、画像処理装置201は、複数の時間帯それぞれにおける複数の認識結果を生成し、それらの認識結果を時系列に画像処理装置3301へ送信する。 A transmission unit 3311 in the image processing apparatus 201 corresponds to the output unit 411 in FIG. 4 or FIG. 22, and transmits a recognition result based on the position information 436 or a recognition result based on the position information 436 and the position information 2212 via a communication network. To the image processing apparatus 3301. For example, the image processing apparatus 201 generates a plurality of recognition results in each of a plurality of time zones, and transmits the recognition results to the image processing apparatus 3301 in time series.
 画像処理装置3301は、受信部3321、表示部3322、及び記憶部3323を含む。受信部3321は、画像処理装置201から時系列に複数の認識結果を受信し、記憶
部3323は、受信した複数の認識結果を複数の時間帯それぞれと対応付けて記憶する。そして、表示部3322は、記憶部3323が記憶する時系列の認識結果を画面上に表示する。
The image processing device 3301 includes a reception unit 3321, a display unit 3322, and a storage unit 3323. The receiving unit 3321 receives a plurality of recognition results in time series from the image processing apparatus 201, and the storage unit 3323 stores the received plurality of recognition results in association with each of a plurality of time zones. The display unit 3322 displays the time-series recognition results stored in the storage unit 3323 on the screen.
 画像処理装置201は、工場等の作業現場に設置されたサーバであってもよく、撮像装置401と通信ネットワークを介して通信する、クラウド上のサーバであってもよい。画像処理装置3301は、作業者の動作を監視する管理者のサーバ又は端末装置であってもよい。 The image processing apparatus 201 may be a server installed at a work site such as a factory, or may be a server on the cloud that communicates with the imaging apparatus 401 via a communication network. The image processing device 3301 may be a server or a terminal device of an administrator who monitors the operation of the worker.
 なお、図4又は図22の画像処理装置201が有する機能は、通信ネットワークを介して接続された複数の装置に分散して実装することも可能である。例えば、検出部211、抽出部212、特定部213、検出部2201、及び上肢検出部2202を、それぞれ異なる装置に設けてもよい。 Note that the functions of the image processing apparatus 201 in FIG. 4 or 22 can be distributed and implemented in a plurality of apparatuses connected via a communication network. For example, the detection unit 211, the extraction unit 212, the specifying unit 213, the detection unit 2201, and the upper limb detection unit 2202 may be provided in different devices.
 図2、図4、及び図22の画像処理装置201の構成は一例に過ぎず、画像処理装置201の用途又は条件に応じて一部の構成要素を省略又は変更してもよい。 The configuration of the image processing apparatus 201 in FIGS. 2, 4, and 22 is merely an example, and some components may be omitted or changed according to the use or conditions of the image processing apparatus 201.
 図33の画像処理システムの構成は一例に過ぎず、画像処理システムの用途又は条件に応じて一部の構成要素を省略又は変更してもよい。 The configuration of the image processing system in FIG. 33 is merely an example, and some components may be omitted or changed according to the use or conditions of the image processing system.
 図3、図6、図7、図10、図11、図13、図14、図16、図20、図24、図26~図28、及び図31のフローチャートは一例に過ぎず、画像処理装置201の構成又は条件に応じて、一部の処理を省略又は変更してもよい。 3, 6, 7, 10, 11, 13, 14, 16, 20, 24, 26 to 28, and 31 are merely examples, and the image processing apparatus Depending on the configuration or conditions of 201, some processes may be omitted or changed.
 例えば、図7の候補領域検出処理において、処理を簡略化する場合は、ステップ702、ステップ703、及びステップ705の処理を省略することができる。ステップ704において、検出部211は、二値画像の上端、左端、又は右端に接している白領域を、候補領域として選択してもよい。図10の手位置検出処理において、ステップ1005の折れ曲がり判定処理又はステップ1006の長さ判定処理のいずれか一方を省略してもよい。 For example, in the candidate area detection process of FIG. 7, when the process is simplified, the processes of Step 702, Step 703, and Step 705 can be omitted. In step 704, the detection unit 211 may select a white region that is in contact with the upper end, the left end, or the right end of the binary image as a candidate region. In the hand position detection process of FIG. 10, either the bending determination process in step 1005 or the length determination process in step 1006 may be omitted.
 図16の長さ判定処理において、処理を簡略化する場合は、ステップ1613~ステップ1620の法線判定処理を省略することができる。図20の位置特定処理において、特定部213は、最大内接円の中心の代わりに、手領域線分情報435が示す線分の中点等を、手のひらの中心位置に決定してもよい。 In the length determination process in FIG. 16, when the process is simplified, the normal determination process in steps 1613 to 1620 can be omitted. In the position specifying process of FIG. 20, the specifying unit 213 may determine the midpoint of the line segment indicated by the hand region line segment information 435 as the center position of the palm instead of the center of the maximum inscribed circle.
 図26の体領域検出処理において、処理を簡略化する場合は、ステップ2602の処理を省略することができる。図27の手位置検出処理において、ステップ2705の折れ曲がり判定処理又はステップ2706の長さ判定処理のいずれか一方を省略してもよい。図28の3次元方向判定処理において、判定部422は、線分Ljを含む領域に写っている
被写体の3次元空間における方向を示すベクトルを、主成分分析以外の方法で求めてもよい。
In the body region detection process of FIG. 26, when the process is simplified, the process of step 2602 can be omitted. In the hand position detection process of FIG. 27, either the bending determination process in step 2705 or the length determination process in step 2706 may be omitted. In the three-dimensional direction determination process of FIG. 28, the determination unit 422 may determine a vector indicating the direction in the three-dimensional space of the subject shown in the region including the line segment L j by a method other than the principal component analysis.
 図31の上肢位置検出処理において、上肢検出部2202は、肩、手首、及び肘のすべての位置を求める必要はない。例えば、上肢検出部2202は、肩、手首、又は肘のいずれかの位置を示す位置情報2212を生成してもよい。 In the upper limb position detection process of FIG. 31, the upper limb detection unit 2202 does not need to obtain all positions of the shoulder, wrist, and elbow. For example, the upper limb detection unit 2202 may generate position information 2212 indicating the position of any one of a shoulder, a wrist, and an elbow.
 図1の3次元距離センサの設置位置と図5及び図23の撮像装置の設置位置は一例に過ぎず、3次元距離センサ又は撮像装置を、別の角度から作業者を撮影できる位置に設置してもよい。 The installation position of the three-dimensional distance sensor in FIG. 1 and the installation position of the imaging device in FIGS. 5 and 23 are merely examples, and the three-dimensional distance sensor or the imaging device is installed at a position where the operator can be photographed from another angle. May be.
 図8及び図25の距離画像及び背景距離画像は一例に過ぎず、距離画像及び背景距離画像は、撮像装置の視野範囲に存在する被写体に応じて変化する。図25の手検出領域及び体検出領域は一例に過ぎず、別の位置又は別の形状の手検出領域及び体検出領域を用いてもよい。 8 and 25 are merely examples, and the distance image and the background distance image change according to the subject existing in the visual field range of the imaging apparatus. The hand detection area and the body detection area in FIG. 25 are merely examples, and a hand detection area and a body detection area having different positions or different shapes may be used.
 図12の線分情報と図15、図17~図19、及び図32の線分は一例に過ぎず、線分情報及び線分は、撮影された距離画像に応じて変化する。図21の最大内接円、図29の周辺領域、及び図30の第1主成分のベクトルは一例に過ぎず、最大内接円、周辺領域、及び第1主成分のベクトルは、撮影された距離画像に応じて変化する。別の形状の周辺領域を用いてもよい。 The line segment information in FIG. 12 and the line segments in FIGS. 15, 17 to 19, and 32 are merely examples, and the line segment information and the line segments change according to the captured distance image. The maximum inscribed circle in FIG. 21, the peripheral region in FIG. 29, and the vector of the first principal component in FIG. 30 are only examples, and the vector of the maximum inscribed circle, the peripheral region, and the first principal component were taken. It changes according to the distance image. Other shaped peripheral regions may be used.
 図34は、図2、図4、及び図22の画像処理装置201として用いられる情報処理装置(コンピュータ)の構成例を示している。図34の情報処理装置は、Central Processing Unit(CPU)3401、メモリ3402、入力装置3403、出力装置3404、
補助記憶装置3405、媒体駆動装置3406、及びネットワーク接続装置3407を備える。これらの構成要素はバス3408により互いに接続されている。撮像装置401は、バス3408に接続されていてもよい。
FIG. 34 shows a configuration example of an information processing apparatus (computer) used as the image processing apparatus 201 in FIGS. 2, 4, and 22. 34 includes a central processing unit (CPU) 3401, a memory 3402, an input device 3403, an output device 3404,
An auxiliary storage device 3405, a medium driving device 3406, and a network connection device 3407 are provided. These components are connected to each other by a bus 3408. The imaging device 401 may be connected to the bus 3408.
 メモリ3402は、例えば、Read Only Memory(ROM)、Random Access Memory(RAM)、フラッシュメモリ等の半導体メモリであり、処理に用いられるプログラム及びデータを格納する。メモリ3402は、記憶部412として用いることができる。 The memory 3402 is a semiconductor memory such as a Read Only Memory (ROM), a Random Access Memory (RAM), and a flash memory, and stores programs and data used for processing. The memory 3402 can be used as the storage unit 412.
 CPU3401(プロセッサ)は、例えば、メモリ3402を利用してプログラムを実行することにより、検出部211、抽出部212、特定部213、線分検出部421、判定部422、検出部2201、及び上肢検出部2202として動作する。 For example, the CPU 3401 (processor) executes a program using the memory 3402 to detect the detection unit 211, the extraction unit 212, the specifying unit 213, the line segment detection unit 421, the determination unit 422, the detection unit 2201, and the upper limb detection. The unit 2202 operates.
 入力装置3403は、例えば、キーボード、ポインティングデバイス等であり、オペレータ又はユーザからの指示又は情報の入力に用いられる。出力装置3404は、例えば、表示装置、プリンタ、スピーカ等であり、オペレータ又はユーザへの問い合わせ又は処理結果の出力に用いられる。処理結果は、位置情報436に基づく認識結果、又は位置情報436及び位置情報2212に基づく認識結果であってもよい。出力装置3404は、出力部411として用いることができる。 The input device 3403 is, for example, a keyboard, a pointing device, etc., and is used for inputting an instruction or information from an operator or a user. The output device 3404 is, for example, a display device, a printer, a speaker, or the like, and is used to output an inquiry or processing result to the operator or the user. The processing result may be a recognition result based on the position information 436 or a recognition result based on the position information 436 and the position information 2212. The output device 3404 can be used as the output unit 411.
 補助記憶装置3405は、例えば、磁気ディスク装置、光ディスク装置、光磁気ディスク装置、テープ装置等である。補助記憶装置3405は、フラッシュメモリ又はハードディスクドライブであってもよい。情報処理装置は、補助記憶装置3405にプログラム及びデータを格納しておき、それらをメモリ3402にロードして使用することができる。補助記憶装置3405は、記憶部412として用いることができる。 The auxiliary storage device 3405 is, for example, a magnetic disk device, an optical disk device, a magneto-optical disk device, a tape device, or the like. The auxiliary storage device 3405 may be a flash memory or a hard disk drive. The information processing apparatus can store a program and data in the auxiliary storage device 3405 and load them into the memory 3402 for use. The auxiliary storage device 3405 can be used as the storage unit 412.
 媒体駆動装置3406は、可搬型記録媒体3409を駆動し、その記録内容にアクセスする。可搬型記録媒体3409は、メモリデバイス、フレキシブルディスク、光ディスク、光磁気ディスク等である。可搬型記録媒体3409は、Compact Disk Read Only Memory(CD-ROM)、Digital Versatile Disk(DVD)、Universal Serial Bus(US
B)メモリ等であってもよい。オペレータ又はユーザは、この可搬型記録媒体3409にプログラム及びデータを格納しておき、それらをメモリ3402にロードして使用することができる。
The medium driving device 3406 drives a portable recording medium 3409 and accesses the recorded contents. The portable recording medium 3409 is a memory device, a flexible disk, an optical disk, a magneto-optical disk, or the like. The portable recording medium 3409 includes Compact Disk Read Only Memory (CD-ROM), Digital Versatile Disk (DVD), Universal Serial Bus (US).
B) It may be a memory or the like. An operator or user can store programs and data in the portable recording medium 3409 and load them into the memory 3402 for use.
 このように、処理に用いられるプログラム及びデータを格納するコンピュータ読み取り可能な記録媒体は、メモリ3402、補助記憶装置3405、又は可搬型記録媒体340
9のような、物理的な(非一時的な)記録媒体である。
As described above, the computer-readable recording medium for storing the program and data used for the processing is the memory 3402, the auxiliary storage device 3405, or the portable recording medium 340.
9 is a physical (non-temporary) recording medium.
 ネットワーク接続装置3407は、Local Area Network、Wide Area Network等の通信
ネットワークに接続され、通信に伴うデータ変換を行う通信インタフェースである。情報処理装置は、プログラム及びデータを外部の装置からネットワーク接続装置3407を介して受信し、それらをメモリ3402にロードして使用することができる。ネットワーク接続装置3407は、出力部411又は送信部3311として用いることができる。
The network connection device 3407 is a communication interface that is connected to a communication network such as a local area network or a wide area network, and performs data conversion accompanying communication. The information processing apparatus can receive a program and data from an external apparatus via the network connection apparatus 3407, and can use them by loading them into the memory 3402. The network connection device 3407 can be used as the output unit 411 or the transmission unit 3311.
 なお、情報処理装置が図34のすべての構成要素を含む必要はなく、用途又は条件に応じて一部の構成要素を省略することも可能である。例えば、オペレータ又はユーザからの指示又は情報を入力する必要がない場合は、入力装置3403を省略してもよい。可搬型記録媒体3409又は通信ネットワークを利用しない場合は、媒体駆動装置3406又はネットワーク接続装置3407を省略してもよい。 Note that the information processing apparatus does not have to include all the components shown in FIG. 34, and some components may be omitted depending on the application or conditions. For example, if it is not necessary to input an instruction or information from an operator or user, the input device 3403 may be omitted. When the portable recording medium 3409 or the communication network is not used, the medium driving device 3406 or the network connection device 3407 may be omitted.
 図34の情報処理装置は、図33の画像処理装置3301としても用いることができる。この場合、ネットワーク接続装置3407は、受信部3321として用いられ、メモリ3402又は補助記憶装置3405は、記憶部3323として用いられ、出力装置3404は、表示部3322として用いられる。 The information processing apparatus in FIG. 34 can also be used as the image processing apparatus 3301 in FIG. In this case, the network connection device 3407 is used as the reception unit 3321, the memory 3402 or the auxiliary storage device 3405 is used as the storage unit 3323, and the output device 3404 is used as the display unit 3322.
 開示の実施形態とその利点について詳しく説明したが、当業者は、請求の範囲に明確に記載した本発明の範囲から逸脱することなく、様々な変更、追加、省略をすることができるであろう。 Although the disclosed embodiments and their advantages have been described in detail, those skilled in the art will be able to make various changes, additions and omissions without departing from the scope of the invention as set forth in the appended claims. .
 図1乃至図34を参照しながら説明した実施形態に関し、さらに以下の付記を開示する。
(付記1)
 距離画像から、手が写っている候補領域を検出する検出部と、
 前記候補領域を近似する複数の線分のうち隣接する2本の線分の間の角度に基づいて、前記複数の線分の中から手領域を近似する線分を抽出する抽出部と、
 前記手領域を近似する線分を用いて、前記候補領域内における前記手の位置を特定する特定部と、
を備えることを特徴とする画像処理装置。
(付記2)
 前記検出部は、前記距離画像の端部に接する領域を前記候補領域として検出し、前記抽出部は、前記角度が第1閾値よりも小さい場合、前記端部までの距離が前記2本の線分よりも遠い線分を、前記手領域を近似する線分の候補から除外し、残された線分のうち前記端部までの距離が最も遠い線分を、前記手領域を近似する線分として抽出することを特徴とする付記1記載の画像処理装置。
(付記3)
 前記抽出部は、前記2本の線分のうち第1線分が前記端部に接しており、前記2本の線分のうち第2線分が第3線分に隣接しており、前記第3線分が第4線分に隣接しており、前記第1線分と前記第2線分の間の角度が、前記第1閾値と、前記第1閾値よりも大きな第2閾値との間の角度範囲に含まれ、前記第2線分と前記第3線分の間の角度が前記角度範囲に含まれ、前記第3線分と前記第4線分の間の角度が前記角度範囲に含まれる場合、前記手領域を近似する線分の候補から前記第4線分を除外することを特徴とする付記2記載の画像処理装置。
(付記4)
 前記抽出部は、前記第1閾値よりも大きな第2閾値よりも前記角度が小さい場合、前記2本の線分のうち前記端部までの距離が近い方の線分に対応する、3次元空間内の線分の長さを用いて、前記2本の線分のうち前記端部までの距離が遠い方の線分を前記手領域を
近似する線分の候補から除外するか否かを判定することを特徴とする付記2記載の画像処理装置。
(付記5)
 前記抽出部は、前記端部までの距離が最も遠い線分上の複数の点それぞれを通る直線と前記候補領域の輪郭との2つの交点を求め、前記2つの交点の間の距離に対応する、前記3次元空間内の距離に基づいて、前記端部までの距離が最も遠い線分から前記手領域を近似する部分を抽出し、前記特定部は、前記手領域を近似する部分を用いて前記手の位置を特定することを特徴とする付記3又は4記載の画像処理装置。
(付記6)
 前記抽出部は、前記端部までの距離が遠い方の線分の前記3次元空間における方向と、前記端部までの距離が遠い方の線分を含む領域に写っている被写体の前記3次元空間における方向とを求め、求めた2つの方向の間の角度が第3閾値よりも大きい場合、前記端部までの距離が遠い方の線分を前記手領域を近似する線分の候補から除外することを特徴とする付記3乃至5のいずれか1項に記載の画像処理装置。
(付記7)
 前記抽出部は、前記候補領域を細線化することで曲線を求め、前記曲線を近似する複数の線分を、前記候補領域を近似する前記複数の線分として求めることを特徴とする付記1乃至6のいずれか1項に記載の画像処理装置。
(付記8)
 前記距離画像と前記手の位置とを用いて、手首、肘、又は肩の位置を求める上肢検出部をさらに備えることを特徴とする付記1乃至7のいずれか1項に記載の画像処理装置。
(付記9)
 前記第1閾値は、手首又は肘の関節の折れ曲がり角度の下限値を表すことを特徴とする付記2乃至8のいずれか1項に記載の画像処理装置。
(付記10)
 距離画像から、手が写っている候補領域を検出する検出部と、
 前記候補領域を近似する複数の線分のうち隣接する2本の線分の間の角度に基づいて、前記複数の線分の中から手領域を近似する線分を抽出する抽出部と、
 前記手領域を近似する線分を用いて、前記候補領域内における前記手の位置を特定する特定部と、
 前記手の位置を示す位置情報を表示する表示部と、
を備えることを特徴とする画像処理システム。
(付記11)
 距離画像から、手が写っている候補領域を検出し、
 前記候補領域を近似する複数の線分のうち隣接する2本の線分の間の角度に基づいて、前記複数の線分の中から手領域を近似する線分を抽出し、
 前記手領域を近似する線分を用いて、前記候補領域内における前記手の位置を特定する、
処理をコンピュータに実行させるための画像処理プログラム。
(付記12)
 前記コンピュータは、前記距離画像の端部に接する領域を前記候補領域として検出し、前記角度が第1閾値よりも小さい場合、前記端部までの距離が前記2本の線分よりも遠い線分を、前記手領域を近似する線分の候補から除外し、残された線分のうち前記端部までの距離が最も遠い線分を、前記手領域を近似する線分として抽出することを特徴とする付記11記載の画像処理プログラム。
(付記13)
 コンピュータが、
 距離画像から、手が写っている候補領域を検出し、
 前記候補領域を近似する複数の線分のうち隣接する2本の線分の間の角度に基づいて、前記複数の線分の中から手領域を近似する線分を抽出し、
 前記手領域を近似する線分を用いて、前記候補領域内における前記手の位置を特定する、
ことを特徴とする画像処理方法。
(付記14)
 前記コンピュータは、前記距離画像の端部に接する領域を前記候補領域として検出し、前記角度が第1閾値よりも小さい場合、前記端部までの距離が前記2本の線分よりも遠い線分を、前記手領域を近似する線分の候補から除外し、残された線分のうち前記端部までの距離が最も遠い線分を、前記手領域を近似する線分として抽出することを特徴とする付記13記載の画像処理方法。
With respect to the embodiment described with reference to FIGS. 1 to 34, the following additional notes are disclosed.
(Appendix 1)
A detection unit that detects a candidate area in which a hand is captured from a distance image;
An extraction unit that extracts a line segment that approximates a hand region from the plurality of line segments based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region;
Using a line segment that approximates the hand region, a specifying unit that specifies the position of the hand in the candidate region;
An image processing apparatus comprising:
(Appendix 2)
The detection unit detects a region in contact with the end of the distance image as the candidate region, and the extraction unit determines that the distance to the end is the two lines when the angle is smaller than a first threshold. A line segment that is farther than the minute is excluded from line segment candidates that approximate the hand region, and a line segment that is the farthest to the end of the remaining line segments is a line segment that approximates the hand region. The image processing apparatus according to appendix 1, wherein the image processing apparatus is extracted as:
(Appendix 3)
In the extraction unit, a first line segment of the two line segments is in contact with the end part, and a second line segment of the two line segments is adjacent to a third line segment, The third line segment is adjacent to the fourth line segment, and an angle between the first line segment and the second line segment is the first threshold value and a second threshold value that is larger than the first threshold value. An angle range between the second line segment and the third line segment is included in the angle range, and an angle between the third line segment and the fourth line segment is the angle range. The image processing apparatus according to appendix 2, wherein the fourth line segment is excluded from line segment candidates that approximate the hand region.
(Appendix 4)
When the angle is smaller than a second threshold value that is larger than the first threshold value, the extraction unit corresponds to a line segment having a shorter distance to the end portion among the two line segments. Using the length of the line segment, it is determined whether to exclude the line segment that is farther from the end of the two line segments from the line segment candidates that approximate the hand region. The image processing apparatus according to appendix 2, wherein:
(Appendix 5)
The extraction unit obtains two intersections between a straight line passing through each of a plurality of points on a line segment having the longest distance to the end and the contour of the candidate area, and corresponds to the distance between the two intersections. Then, based on the distance in the three-dimensional space, the part that approximates the hand region is extracted from the line segment that is the farthest to the end, and the specifying unit uses the part that approximates the hand region. The image processing apparatus according to appendix 3 or 4, wherein the position of the hand is specified.
(Appendix 6)
The extraction unit is configured to detect the three-dimensional object captured in a region including a direction in the three-dimensional space of a line segment that is farther to the end and a line segment that is farther to the end. If the angle between the two directions is greater than the third threshold value, the line segment that is farther to the end is excluded from the line segment candidates that approximate the hand region. The image processing apparatus according to any one of appendices 3 to 5, wherein
(Appendix 7)
The extraction unit obtains a curve by thinning the candidate area, and obtains a plurality of line segments approximating the curve as the plurality of line segments approximating the candidate area. The image processing apparatus according to any one of claims 6 to 6.
(Appendix 8)
The image processing apparatus according to any one of appendices 1 to 7, further comprising an upper limb detection unit that obtains a position of a wrist, an elbow, or a shoulder using the distance image and the position of the hand.
(Appendix 9)
9. The image processing device according to any one of appendices 2 to 8, wherein the first threshold value represents a lower limit value of a bending angle of a wrist or elbow joint.
(Appendix 10)
A detection unit that detects a candidate area in which a hand is captured from a distance image;
An extraction unit that extracts a line segment that approximates a hand region from the plurality of line segments based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region;
Using a line segment that approximates the hand region, a specifying unit that specifies the position of the hand in the candidate region;
A display unit for displaying position information indicating the position of the hand;
An image processing system comprising:
(Appendix 11)
From the distance image, detect the candidate area where the hand is reflected,
Based on the angle between two adjacent line segments out of a plurality of line segments approximating the candidate area, a line segment approximating a hand area is extracted from the plurality of line segments,
Identifying the position of the hand in the candidate region using a line segment approximating the hand region;
An image processing program for causing a computer to execute processing.
(Appendix 12)
The computer detects a region that is in contact with an end of the distance image as the candidate region, and when the angle is smaller than a first threshold, a line segment whose distance to the end is farther than the two line segments. Is extracted from the line segment candidates that approximate the hand region, and the line segment that is the farthest to the end of the remaining line segments is extracted as a line segment that approximates the hand region. The image processing program according to appendix 11.
(Appendix 13)
Computer
From the distance image, detect the candidate area where the hand is reflected,
Based on the angle between two adjacent line segments out of a plurality of line segments approximating the candidate area, a line segment approximating a hand area is extracted from the plurality of line segments,
Identifying the position of the hand in the candidate region using a line segment approximating the hand region;
An image processing method.
(Appendix 14)
The computer detects a region that is in contact with an end of the distance image as the candidate region, and when the angle is smaller than a first threshold, a line segment whose distance to the end is farther than the two line segments. Is extracted from the line segment candidates that approximate the hand region, and the line segment that is the farthest to the end of the remaining line segments is extracted as a line segment that approximates the hand region. The image processing method according to appendix 13.
 101 作業者
 102 作業台
 103、2303 物体
 111 3次元距離センサ
 112、501、2301、3001 視野範囲
 201、3301 画像処理装置
 211、2201 検出部
 212 抽出部
 213 特定部
 401 撮像装置
 411 出力部
 412、3323 記憶部
 421 線分検出部
 422 判定部
 431 背景距離画像
 432 距離画像
 433、2211 領域情報
 434 線分情報
 435 手領域線分情報
 436、2212 位置情報
 502 左手
 503 右手
 2101 線分
 2102 点
 2103 最大内接円
 2202 上肢検出部
 2302 上半身
 2501 手検出領域
 2502 体検出領域
 2901 領域
 3201 中心位置
 3311 送信部
 3321 受信部
 3322 表示部
 3401 CPU
 3402 メモリ
 3403 入力装置
 3404 出力装置
 3405 補助記憶装置
 3406 媒体駆動装置
 3407 ネットワーク接続装置
 3408 バス
 3409 可搬型記録媒体
DESCRIPTION OF SYMBOLS 101 Worker 102 Worktable 103, 2303 Object 111 Three- dimensional distance sensor 112, 501, 2301, 3001 Field of view range 201, 3301 Image processing device 211, 2201 Detection unit 212 Extraction unit 213 Identification unit 401 Imaging device 411 Output unit 412, 3323 Storage unit 421 Line segment detection unit 422 Determination unit 431 Background distance image 432 Distance image 433, 2211 Area information 434 Line segment information 435 Hand area segment information 436, 2212 Position information 502 Left hand 503 Right hand 2101 Line segment 2102 Point 2103 Maximum inscribed Circle 2202 Upper limb detection unit 2302 Upper body 2501 Hand detection region 2502 Body detection region 2901 region 3201 Center position 3311 Transmission unit 3321 Reception unit 3322 Display unit 3401 CPU
3402 Memory 3403 Input device 3404 Output device 3405 Auxiliary storage device 3406 Medium drive device 3407 Network connection device 3408 Bus 3409 Portable recording medium

Claims (14)

  1.  距離画像から、手が写っている候補領域を検出する検出部と、
     前記候補領域を近似する複数の線分のうち隣接する2本の線分の間の角度に基づいて、前記複数の線分の中から手領域を近似する線分を抽出する抽出部と、
     前記手領域を近似する線分を用いて、前記候補領域内における前記手の位置を特定する特定部と、
    を備えることを特徴とする画像処理装置。
    A detection unit that detects a candidate area in which a hand is captured from a distance image;
    An extraction unit that extracts a line segment that approximates a hand region from the plurality of line segments based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region;
    Using a line segment that approximates the hand region, a specifying unit that specifies the position of the hand in the candidate region;
    An image processing apparatus comprising:
  2.  前記検出部は、前記距離画像の端部に接する領域を前記候補領域として検出し、前記抽出部は、前記角度が第1閾値よりも小さい場合、前記端部までの距離が前記2本の線分よりも遠い線分を、前記手領域を近似する線分の候補から除外し、残された線分のうち前記端部までの距離が最も遠い線分を、前記手領域を近似する線分として抽出することを特徴とする請求項1記載の画像処理装置。 The detection unit detects a region in contact with the end of the distance image as the candidate region, and the extraction unit determines that the distance to the end is the two lines when the angle is smaller than a first threshold. A line segment that is farther than the minute is excluded from line segment candidates that approximate the hand region, and a line segment that is the farthest to the end of the remaining line segments is a line segment that approximates the hand region. The image processing device according to claim 1, wherein the image processing device is extracted as:
  3.  前記抽出部は、前記2本の線分のうち第1線分が前記端部に接しており、前記2本の線分のうち第2線分が第3線分に隣接しており、前記第3線分が第4線分に隣接しており、前記第1線分と前記第2線分の間の角度が、前記第1閾値と、前記第1閾値よりも大きな第2閾値との間の角度範囲に含まれ、前記第2線分と前記第3線分の間の角度が前記角度範囲に含まれ、前記第3線分と前記第4線分の間の角度が前記角度範囲に含まれる場合、前記手領域を近似する線分の候補から前記第4線分を除外することを特徴とする請求項2記載の画像処理装置。 In the extraction unit, a first line segment of the two line segments is in contact with the end part, and a second line segment of the two line segments is adjacent to a third line segment, The third line segment is adjacent to the fourth line segment, and an angle between the first line segment and the second line segment is the first threshold value and a second threshold value that is larger than the first threshold value. An angle range between the second line segment and the third line segment is included in the angle range, and an angle between the third line segment and the fourth line segment is the angle range. The image processing apparatus according to claim 2, wherein the fourth line segment is excluded from line segment candidates that approximate the hand region.
  4.  前記抽出部は、前記第1閾値よりも大きな第2閾値よりも前記角度が小さい場合、前記2本の線分のうち前記端部までの距離が近い方の線分に対応する、3次元空間内の線分の長さを用いて、前記2本の線分のうち前記端部までの距離が遠い方の線分を前記手領域を
    近似する線分の候補から除外するか否かを判定することを特徴とする請求項2記載の画像処理装置。
    When the angle is smaller than a second threshold value that is larger than the first threshold value, the extraction unit corresponds to a line segment having a shorter distance to the end portion among the two line segments. Using the length of the line segment, it is determined whether to exclude the line segment that is farther from the end of the two line segments from the line segment candidates that approximate the hand region. The image processing apparatus according to claim 2, wherein:
  5.  前記抽出部は、前記端部までの距離が最も遠い線分上の複数の点それぞれを通る直線と前記候補領域の輪郭との2つの交点を求め、前記2つの交点の間の距離に対応する、前記3次元空間内の距離に基づいて、前記端部までの距離が最も遠い線分から前記手領域を近似する部分を抽出し、前記特定部は、前記手領域を近似する部分を用いて前記手の位置を特定することを特徴とする請求項3又は4記載の画像処理装置。 The extraction unit obtains two intersections between a straight line passing through each of a plurality of points on a line segment having the longest distance to the end and the contour of the candidate area, and corresponds to the distance between the two intersections. Then, based on the distance in the three-dimensional space, the part that approximates the hand region is extracted from the line segment that is the farthest to the end, and the specifying unit uses the part that approximates the hand region. 5. The image processing apparatus according to claim 3, wherein the position of the hand is specified.
  6.  前記抽出部は、前記端部までの距離が遠い方の線分の前記3次元空間における方向と、前記端部までの距離が遠い方の線分を含む領域に写っている被写体の前記3次元空間における方向とを求め、求めた2つの方向の間の角度が第3閾値よりも大きい場合、前記端部までの距離が遠い方の線分を前記手領域を近似する線分の候補から除外することを特徴とする請求項3乃至5のいずれか1項に記載の画像処理装置。 The extraction unit is configured to detect the three-dimensional object captured in a region including a direction in the three-dimensional space of a line segment that is farther to the end and a line segment that is farther to the end. If the angle between the two directions is greater than the third threshold value, the line segment that is farther to the end is excluded from the line segment candidates that approximate the hand region. The image processing apparatus according to claim 3, wherein the image processing apparatus is an image processing apparatus.
  7.  前記抽出部は、前記候補領域を細線化することで曲線を求め、前記曲線を近似する複数の線分を、前記候補領域を近似する前記複数の線分として求めることを特徴とする請求項1乃至6のいずれか1項に記載の画像処理装置。 The extraction unit obtains a curve by thinning the candidate area, and obtains a plurality of line segments that approximate the curve as the plurality of line segments that approximate the candidate area. 7. The image processing device according to any one of items 1 to 6.
  8.  前記距離画像と前記手の位置とを用いて、手首、肘、又は肩の位置を求める上肢検出部をさらに備えることを特徴とする請求項1乃至7のいずれか1項に記載の画像処理装置。 The image processing apparatus according to claim 1, further comprising an upper limb detection unit that obtains a position of a wrist, an elbow, or a shoulder using the distance image and the position of the hand. .
  9.  前記第1閾値は、手首又は肘の関節の折れ曲がり角度の下限値を表すことを特徴とする請求項2乃至8のいずれか1項に記載の画像処理装置。 9. The image processing apparatus according to claim 2, wherein the first threshold value represents a lower limit value of a bending angle of a wrist or elbow joint.
  10.  距離画像から、手が写っている候補領域を検出する検出部と、
     前記候補領域を近似する複数の線分のうち隣接する2本の線分の間の角度に基づいて、前記複数の線分の中から手領域を近似する線分を抽出する抽出部と、
     前記手領域を近似する線分を用いて、前記候補領域内における前記手の位置を特定する特定部と、
     前記手の位置を示す位置情報を表示する表示部と、
    を備えることを特徴とする画像処理システム。
    A detection unit that detects a candidate area in which a hand is captured from a distance image;
    An extraction unit that extracts a line segment that approximates a hand region from the plurality of line segments based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region;
    Using a line segment that approximates the hand region, a specifying unit that specifies the position of the hand in the candidate region;
    A display unit for displaying position information indicating the position of the hand;
    An image processing system comprising:
  11.  距離画像から、手が写っている候補領域を検出し、
     前記候補領域を近似する複数の線分のうち隣接する2本の線分の間の角度に基づいて、前記複数の線分の中から手領域を近似する線分を抽出し、
     前記手領域を近似する線分を用いて、前記候補領域内における前記手の位置を特定する、
    処理をコンピュータに実行させるための画像処理プログラム。
    From the distance image, detect the candidate area where the hand is reflected,
    Based on the angle between two adjacent line segments out of a plurality of line segments approximating the candidate area, a line segment approximating a hand area is extracted from the plurality of line segments,
    Identifying the position of the hand in the candidate region using a line segment approximating the hand region;
    An image processing program for causing a computer to execute processing.
  12.  前記コンピュータは、前記距離画像の端部に接する領域を前記候補領域として検出し、前記角度が第1閾値よりも小さい場合、前記端部までの距離が前記2本の線分よりも遠い線分を、前記手領域を近似する線分の候補から除外し、残された線分のうち前記端部までの距離が最も遠い線分を、前記手領域を近似する線分として抽出することを特徴とする請求項11記載の画像処理プログラム。 The computer detects a region that is in contact with an end of the distance image as the candidate region, and when the angle is smaller than a first threshold, a line segment whose distance to the end is farther than the two line segments. Is extracted from the line segment candidates that approximate the hand region, and the line segment that is the farthest to the end of the remaining line segments is extracted as a line segment that approximates the hand region. The image processing program according to claim 11.
  13.  コンピュータが、
     距離画像から、手が写っている候補領域を検出し、
     前記候補領域を近似する複数の線分のうち隣接する2本の線分の間の角度に基づいて、前記複数の線分の中から手領域を近似する線分を抽出し、
     前記手領域を近似する線分を用いて、前記候補領域内における前記手の位置を特定する、
    ことを特徴とする画像処理方法。
    Computer
    From the distance image, detect the candidate area where the hand is reflected,
    Based on the angle between two adjacent line segments out of a plurality of line segments approximating the candidate area, a line segment approximating a hand area is extracted from the plurality of line segments,
    Identifying the position of the hand in the candidate region using a line segment approximating the hand region;
    An image processing method.
  14.  前記コンピュータは、前記距離画像の端部に接する領域を前記候補領域として検出し、前記角度が第1閾値よりも小さい場合、前記端部までの距離が前記2本の線分よりも遠い線分を、前記手領域を近似する線分の候補から除外し、残された線分のうち前記端部までの距離が最も遠い線分を、前記手領域を近似する線分として抽出することを特徴とする請求項13記載の画像処理方法。 The computer detects a region that is in contact with an end of the distance image as the candidate region, and when the angle is smaller than a first threshold, a line segment whose distance to the end is farther than the two line segments. Is extracted from the line segment candidates that approximate the hand region, and the line segment that is the farthest to the end of the remaining line segments is extracted as a line segment that approximates the hand region. The image processing method according to claim 13.
PCT/JP2018/000120 2017-01-17 2018-01-05 Image processing device, image processing system, image processing program, and image processing method WO2018135326A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2017005872A JP2018116397A (en) 2017-01-17 2017-01-17 Image processing device, image processing system, image processing program, and image processing method
JP2017-005872 2017-01-17

Publications (1)

Publication Number Publication Date
WO2018135326A1 true WO2018135326A1 (en) 2018-07-26

Family

ID=62907902

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/000120 WO2018135326A1 (en) 2017-01-17 2018-01-05 Image processing device, image processing system, image processing program, and image processing method

Country Status (2)

Country Link
JP (1) JP2018116397A (en)
WO (1) WO2018135326A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222379A (en) * 2018-11-27 2020-06-02 株式会社日立制作所 Hand detection method and device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7375456B2 (en) 2019-10-18 2023-11-08 株式会社アイシン Toe position estimation device and fingertip position estimation device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009294843A (en) * 2008-06-04 2009-12-17 Tokai Rika Co Ltd Operator discrimination apparatus and operator discrimination method
JP2012123667A (en) * 2010-12-09 2012-06-28 Panasonic Corp Attitude estimation device and attitude estimation method
JP2012133665A (en) * 2010-12-22 2012-07-12 Sogo Keibi Hosho Co Ltd Held object recognition device, held object recognition method and held object recognition program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009294843A (en) * 2008-06-04 2009-12-17 Tokai Rika Co Ltd Operator discrimination apparatus and operator discrimination method
JP2012123667A (en) * 2010-12-09 2012-06-28 Panasonic Corp Attitude estimation device and attitude estimation method
JP2012133665A (en) * 2010-12-22 2012-07-12 Sogo Keibi Hosho Co Ltd Held object recognition device, held object recognition method and held object recognition program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ITOH, MASATSUGU ET AL.: "Hand and object tracking system for presentation scenes", PROCEEDINGS OF THE INFORMATION AND SYSTEMS SOCIETY CONFERENCE OF IEICE 2001, 29 August 2001 (2001-08-29) *
IWATA, TATSUAKI ET AL.: "3-D information input system based on hand motion recognition by image sequence processing", TECHNICAL REPORT OF IEICE, vol. 100, no. 634, 16 February 2001 (2001-02-16), pages 29 - 36 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222379A (en) * 2018-11-27 2020-06-02 株式会社日立制作所 Hand detection method and device

Also Published As

Publication number Publication date
JP2018116397A (en) 2018-07-26

Similar Documents

Publication Publication Date Title
Ma et al. Kinect sensor-based long-distance hand gesture recognition and fingertip detection with depth information
JP4625074B2 (en) Sign-based human-machine interaction
JP4934220B2 (en) Hand sign recognition using label assignment
US11847803B2 (en) Hand trajectory recognition method for following robot based on hand velocity and trajectory distribution
KR101612605B1 (en) Method for extracting face feature and apparatus for perforimg the method
CN111844019A (en) Method and device for determining grabbing position of machine, electronic device and storage medium
JPWO2009147904A1 (en) Finger shape estimation device, finger shape estimation method and program
Papanikolopoulos Selection of features and evaluation of visual measurements during robotic visual servoing tasks
CN104866824A (en) Manual alphabet identification method based on Leap Motion
JP2016014954A (en) Method for detecting finger shape, program thereof, storage medium of program thereof, and system for detecting finger shape
CN116249607A (en) Method and device for robotically gripping three-dimensional objects
WO2018135326A1 (en) Image processing device, image processing system, image processing program, and image processing method
JP2009216503A (en) Three-dimensional position and attitude measuring method and system
KR101706864B1 (en) Real-time finger and gesture recognition using motion sensing input devices
JP2021000694A (en) Device for teaching robot and robot system
JP2020179441A (en) Control system, information processing device and control method
JP5083715B2 (en) 3D position and orientation measurement method and apparatus
Gao et al. Parallel dual-hand detection by using hand and body features for robot teleoperation
CN111598172A (en) Dynamic target grabbing posture rapid detection method based on heterogeneous deep network fusion
JP6393495B2 (en) Image processing apparatus and object recognition method
Kim et al. Visual multi-touch air interface for barehanded users by skeleton models of hand regions
US20200338764A1 (en) Object detection method and robot system
JP2019159470A (en) Estimation device, estimation method and estimation program
Bhuyan et al. Hand gesture recognition and animation for local hand motions
CN109934155B (en) Depth vision-based collaborative robot gesture recognition method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18741358

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18741358

Country of ref document: EP

Kind code of ref document: A1