WO2018135326A1

WO2018135326A1 - Image processing device, image processing system, image processing program, and image processing method

Info

Publication number: WO2018135326A1
Application number: PCT/JP2018/000120
Authority: WO
Inventors: 源太鈴木
Original assignee: 富士通株式会社
Priority date: 2017-01-17
Filing date: 2018-01-05
Publication date: 2018-07-26
Also published as: JP2018116397A

Abstract

[Problem] To identify with good precision the position of a hand gripping an object from a distance image of the hand. [Solution] A detection unit 211 detects from a distance image a candidate region in which a hand is captured. On the basis of angles between two adjacent line segments among a plurality of line segments that approximate the candidate region, an extraction unit 212 extracts line segments which approximate a hand region from among the plurality of line segments that approximate the candidate region. Using the line segments that approximate the hand region, an identification unit 213 identifies the position of the hand in the candidate region.

Description

Image processing apparatus, image processing system, image processing program, and image processing method

The present invention relates to an image processing apparatus, an image processing system, an image processing program, and an image processing method.

In recent years, with the widespread use of three-dimensional distance sensors, many techniques have been developed that detect the user's skeleton, joint positions, and the like from distance images and recognize gestures and other actions based on the detection results (for example, Patent Document 1). And Patent Document 2). The distance sensor and the distance image may be referred to as a depth sensor and a depth image, respectively.

A method of thinning a line of a binarized image, a method of converting continuous points into a plurality of approximate line segments, a method of obtaining an angle composed of three points, a prescribed value of the size of the human body, etc. are also known (for example, Non-Patent Document 1 to Non-Patent Document 4).

US Patent Application Publication No. 2011/0085705 Japanese Patent Laying-Open No. 2015-94828

Automatic detection of work mistakes and correct work movement if the movement of the worker's hand at the work site can be detected using an environment-installed camera or a three-dimensional distance sensor to recognize the worker's movement. Education can be realized. Thereby, work efficiency and work quality can be improved.

However, in an environment where there are work objects, tools, and other objects around the worker, such as assembly work on a factory belt conveyor line, the computer is shown in the image in an environment where the worker grips the object and works. It is difficult to distinguish an operator from an object. For this reason, it becomes difficult to recognize the movement of the operator from the three-dimensional position of the hand.

Note that such a problem occurs not only when a worker is holding a work object or tool at a factory, but also when a person is holding an object in another environment.

In one aspect, an object of the present invention is to accurately identify the position of a hand from a distance image of a hand holding an object.

In one proposal, the image processing apparatus includes a detection unit, an extraction unit, and a specification unit. The detection unit detects a candidate area where a hand is captured from the distance image. The extraction unit extracts a line segment that approximates the hand region from a plurality of line segments that approximate the candidate region, based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region. To extract. The specifying unit specifies the position of the hand in the candidate area using a line segment that approximates the hand area.

According to the embodiment, the position of the hand can be accurately identified from the distance image of the hand holding the object.

It is a figure which shows a three-dimensional distance sensor. It is a functional block diagram of an image processing apparatus. It is a flowchart of an image process. It is a functional block diagram which shows the 1st specific example of an image processing apparatus. It is a figure which shows the visual field range of an imaging device. It is a flowchart which shows the specific example of an image process. It is a flowchart of a candidate area | region detection process. It is a figure which shows a distance image and a background distance image. It is a figure which shows the binarized difference image. It is a flowchart of a hand position detection process. It is a flowchart of a line segment detection process. It is a figure which shows line segment information. It is a flowchart of a parameter calculation process. It is a flowchart of a bending determination process. It is a figure which shows four line segments used as the object of bending determination processing. It is a flowchart of a length determination process. It is a figure which shows four line segments used as the object of length determination processing. It is a figure which shows three line segments used as the object of length determination processing. It is a figure which shows the line segment used as the object of a normal line determination process. It is a flowchart of a position specific process. It is a figure which shows the maximum inscribed circle. It is a functional block diagram which shows the 2nd specific example of an image processing apparatus. It is a figure which shows the visual field range of an imaging device in the case of calculating | requiring the position of an upper limb. It is a flowchart which shows the specific example of the image process which calculates | requires the position of an upper limb. It is a figure which shows the distance image in the case of calculating | requiring the position of an upper limb, and a background distance image. It is a flowchart of a body region detection process. It is a flowchart of a hand position detection process including a three-dimensional direction determination process. It is a flowchart of a three-dimensional direction determination process. It is a figure which shows a peripheral region. It is a figure which shows the vector of a 1st main component. It is a flowchart of an upper limb position detection process. It is a figure which shows the position of a wrist. It is a functional block diagram of an image processing system. It is a block diagram of information processing apparatus.

Hereinafter, embodiments will be described in detail with reference to the drawings.
FIG. 1 shows an example of a three-dimensional distance sensor installed in an assembly line. The worker 101 performs product assembly work while holding an object 103 such as a tool or a part on the work table 102. The field of view 112 of the three-dimensional distance sensor 111 installed above the work table 102 includes the hand of the worker 101, and the three-dimensional distance sensor 111 obtains a distance image showing the hand during work. can do.

However, if the worker 101 and the work table 102 or the object 103 are close to each other, or if the worker 101 is holding the object 103, the difference between the distance value of the hand and the distance value of the object 103 in the distance image is obtained. Get smaller. For this reason, it is difficult to separate the hand and the object 103 by the method of separating the foreground and the background based on the distance value. Even if the hand region is detected based on the difference between the captured distance image and the background distance image of only the background, both the hand and the object 103 are detected as the background difference, and therefore both are separated. It is difficult.

For example, in the technique disclosed in Patent Document 1, a skeleton of a user's body part is recognized by tracking using machine learning, and a tool held by the user is recognized by tracking in a passive method or active method. . In passive tracking, an infrared retro-reflective marker is attached to a tool, and the tool is recognized by detecting the marker with an external device such as a camera. On the other hand, in active tracking, a three-dimensional position sensor, an acceleration sensor, and the like are built in the tool, and these sensors notify the tracking system of the position of the tool.

However, with this technology, it is difficult to correctly recognize the position of the object and the hand when the distance between the object and the hand is short in an operation using an unknown object on which no marker or sensor is mounted. Also, when recognizing a skeleton using machine learning, it is desirable to learn the state of grasping the object in advance, but for all objects of unknown shape and unknown size, grasp the object. It is not realistic to learn the state in advance.

FIG. 2 shows a functional configuration example of the image processing apparatus. The image processing apparatus 201 in FIG. 2 includes a detection unit 211, an extraction unit 212, and a specification unit 213.

FIG. 3 is a flowchart illustrating an example of image processing performed by the image processing apparatus 201 in FIG. First, the detection unit 211 detects a candidate area where a hand is shown from the distance image (step 301). Next, the extraction unit 212 selects a hand region from a plurality of line segments that approximate the candidate region based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region. An approximate line segment is extracted (step 302). Then, the specifying unit 213 specifies the position of the hand in the candidate area using a line segment that approximates the hand area (step 303).

According to such an image processing apparatus 201, the position of the hand can be accurately identified from the distance image of the hand holding the object.

FIG. 4 shows a first specific example of the image processing apparatus 201 of FIG. The image processing apparatus 201 in FIG. 4 includes a detection unit 211, an extraction unit 212, a specification unit 213, an output unit 411, and a storage unit 412. The extraction unit 212 includes a line segment detection unit 421 and a determination unit 422. The imaging device 401 is an example of a three-dimensional distance sensor. The imaging device 401 captures a distance image 432 including a pixel value representing the distance from the imaging device 401 to the subject and outputs the captured image to the image processing device 201. As the imaging device 401, for example, an infrared camera can be used.

FIG. 5 shows an example of the field of view range of the imaging apparatus 401. The imaging device 401 in FIG. 5 is installed above the workbench 102, and the visual field range 501 includes the object 103 held by the workbench 102, the left hand 502 and the right hand 503 of the worker 101, and the right hand 503. It is included. Therefore, the work table 102, the left hand 502, the right hand 503, and the object 103 are shown in the distance image 432 captured by the imaging device 401.

The storage unit 412 stores a background distance image 431 and a distance image 432. The background distance image 431 is a distance image captured in advance in a state where the left hand 502, the right hand 503, and the object 103 are not included in the visual field range 501.

The detecting unit 211 detects a candidate area from the distance image 432 using the background distance image 431 and stores area information 433 indicating the detected candidate area in the storage unit 412. The candidate area is an area in which at least one of the left hand 502 or the right hand 503 of the worker 101 is estimated to be captured. For example, an area in contact with the end of the distance image 432 is detected as a candidate area.

The line segment detection unit 421 obtains a curve by thinning the candidate area indicated by the area information 433, and obtains a plurality of connected line segments that approximate the curve as a plurality of line segments that approximate the candidate area. Then, the line segment detection unit 421 stores line segment information 434 indicating those line segments in the storage unit 412.

The determination unit 422 obtains the angle of the line segment for each combination of two adjacent line segments included in the plurality of line segments indicated by the line segment information 434, and the line segment in the three-dimensional space corresponding to each line segment. Find the length of. Then, the determination unit 422 determines whether each line segment corresponds to the hand area using the obtained angle and length, and approximates the hand area from the plurality of line segments indicated by the line segment information 434. Extract line segments.

At this time, when the angle between the two line segments is smaller than the threshold θ _min , the determination unit 422 selects another line segment whose distance to the end of the distance image 432 is farther than those line segments. Exclude the region from the approximate line segments. θ _min may be a threshold value representing the lower limit value of the bending angle of the wrist or elbow joint.

In addition, when the angle between the two line segments is larger than the threshold value θ _{min and} smaller than the threshold value θ _max , the determination unit 422 is far from the end of the distance image 432 among those line segments. It is determined whether or not the other line segment is excluded from the line segment candidates that approximate the hand region. For this determination, the length of the line segment in the three-dimensional space corresponding to the line segment closer to the end of the distance image 432 is used.

Next, the determination unit 422 determines the line based on the distance from each of a plurality of points on the line segment that is the farthest to the end of the distance image 432 from the remaining line segments to the contour of the candidate region. Extract the part that approximates the hand area from the minutes. Then, the determination unit 422 stores hand region line segment information 435 indicating the extracted portion in the storage unit 412.

The specifying unit 213 uses the contour of the candidate area indicated by the area information 433 and the line segment indicated by the hand area line segment information 435 to specify the position of the hand in the candidate area, and indicates the specified position. Information 436 is stored in the storage unit 412. The output unit 411 outputs a recognition result based on the position information 436.

The output unit 411 may be a display unit that displays the recognition result on the screen, or may be a transmission unit that transmits the recognition result to another image processing apparatus. The recognition result may be a trajectory indicating a change in the three-dimensional position of the hand, or information indicating an operator's motion estimated from the hand trajectory.

According to the image processing apparatus 201 in FIG. 4, the object 103 is detected based on the angle of the line segment and the length of the line segment in the three-dimensional space from the line segments approximating the candidate area detected from the distance image 432. Corresponding line segments can be excluded. Then, by identifying the position of the hand within the remaining line segment including the line segment that is the farthest from the end of the distance image 432, even if the object 103 is an unknown object, the right hand The position of 503 can be specified with high accuracy. Therefore, the recognition accuracy of the position of the hand in proximity to the unknown object is improved, and the recognition accuracy of the operator's motion is also improved.

FIG. 6 is a flowchart showing a specific example of image processing performed by the image processing apparatus 201 in FIG. First, the detection unit 211 performs candidate area detection processing (step 601), and then the extraction unit 212 and the specifying unit 213 perform hand position detection processing (step 602).

FIG. 7 is a flowchart showing an example of candidate area detection processing in step 601 of FIG. First, the detection unit 211 subtracts the pixel value of each pixel of the background distance image 431 from the pixel value (distance value) of each pixel of the distance image 432, generates a difference image, and binarizes the generated difference image. (Step 701).

FIG. 8 shows an example of the distance image 432 and the background distance image 431. FIG. 8A shows an example of the distance image 432, and FIG. 8B shows an example of the background distance image 431. Since the pixel values of the distance image 432 and the background distance image 431 represent the distance from the imaging device 401 to the subject, the pixel values are smaller as the subject is closer to the imaging device 401 and larger as the subject is farther from the imaging device 401. In this case, the difference between the background pixel values common to the distance image 432 and the background distance image 431 is close to 0, but the difference between the foreground pixel values closer to the imaging device 401 than the background is a negative value.

Therefore, the detection unit 211 compares the pixel value difference with T1 using a negative predetermined value as the threshold T1, and if the difference is less than T1, sets the pixel value of the difference image to 255 (white), If the difference is equal to or greater than T1, the pixel value of the difference image is set to 0 (black). Thereby, the difference image is binarized.

FIG. 9 shows an example of a binarized difference image. FIG. 9A shows an example of a binary image generated in step 701. When the difference image is binarized, not only the pixel value of both hands of the operator but also the pixel value of an object close to the hand is set to white.

Next, the detection unit 211 performs opening and closing for each pixel of the binary image (step 702). First, by performing the opening, white pixels are reduced and small white areas are removed. Thereafter, by performing closing, the black isolated point generated by the opening can be changed to white. Thereby, the binary image of FIG. 9B is generated from the binary image of FIG.

Next, the detection unit 211 selects each white pixel as a target pixel, and calculates a difference between the pixel value in the distance image 432 of the target pixel and the pixel value in the distance image 432 of white pixels adjacent to the target pixel in the upper, lower, left, and right directions. Ask. Then, the detection unit 211 compares the maximum absolute value of the difference with a predetermined threshold T2, and when the maximum value is equal to or greater than T2, changes the target pixel from a white pixel to a black pixel (step 703). Thereby, the binary image of FIG. 9C is generated from the binary image of FIG. 9B.

Next, the detection unit 211 performs contour detection processing to detect a plurality of white areas, and selects a white area that satisfies the following condition as a candidate area from the detected white areas (step 704).
(1) The white area is in contact with the end of the binary image.
(2) The area of the white region is not less than a predetermined value.
(3) A maximum of two white regions from the larger area.

For example, in the case of the distance image 432 illustrated in FIG. 8A, the operator's arm extends from the lower end (body side) of the image toward the upper end, and thus a white region in contact with the lower end of the binary image is a candidate. Selected as a region. In this case, one white region shown in FIG. 9D is selected from the binary image shown in FIG.

Next, the detection unit 211 smoothes the outline of the selected white area (step 705). In step 703, the outline of the white region may be uneven due to the influence of wrinkles on clothes worn by the worker. Therefore, by performing the processing in step 705, such unevenness can be smoothed. For example, closing may be used as the smoothing process. By the smoothing process, the binary image shown in FIG. 9E is generated from the binary image shown in FIG. 9D.

Next, when the white region is in contact with the lower end of the binary image at two locations and the number of contour pixels in contact with the lower end is equal to or greater than a predetermined value, the detection unit 211 detects the white region as the binary image. The image is divided into two white areas that contact the lower end at only one place (step 706). Then, the detection unit 211 generates region information 433 indicating the white region generated by the division.

For example, when the y-axis is set from the upper end to the lower end of the binary image and the x-axis is set from the left end to the right end, the y-coordinate value increases as it approaches the lower end of the binary image. In this case, the detection unit 211 obtains a contour part sandwiched between two contour parts in contact with the white region among the contours of the binary image, and is located at the top of the pixels of the obtained contour part. The x coordinate x1 of the pixel (the pixel with the smallest y coordinate) is obtained. Then, the detection unit 211 can divide the white area into two white areas by changing all white pixels whose x coordinate is x1 in the white area to black pixels.

Thereby, the binary image shown in FIG. 9F is generated from the binary image shown in FIG. 9E. In the binary image of FIG. 9F, the two white regions generated by the division correspond to a region including the left hand and a region including the right hand, respectively.

Therefore, the detection unit 211 compares the x coordinates of the contour portions included in the two white areas, determines the white area having the smaller x coordinate as a candidate area for the left hand, and selects the white area having the larger x coordinate as the white area. Determine the candidate area for the right hand. In this case, the detection unit 211 sets 2 to a variable nHands representing the number of detected candidate regions.

On the other hand, in step 704, when two white areas touching the lower end of the binary image at only one place are selected, the detection unit 211 sets nHands to 2 without dividing any white area. In addition, when one white region that touches the lower end of the binary image at only one location is selected, the detection unit 211 sets 1 to nHands.

Note that as the pixel values of the distance image 432 and the background distance image 431, values that increase as the subject is closer to the imaging device 401 and decrease as the subject is farther from the imaging device 401 can be used. In this case, in step 701, the difference between the foreground pixel values becomes a positive value. Therefore, the detection unit 211 uses a positive predetermined value as the threshold T1, sets the pixel value of the difference image to 255 (white) when the difference is larger than T1, and sets the pixel value of the difference image to be equal to or less than T1. The pixel value is set to 0 (black).

When a plurality of workers are working on the workbench 102 and the field of view of the imaging apparatus 401 includes N hands (N is an integer of 3 or more), in step 704, the detection unit 211 displays the area. A maximum of N white regions may be selected in order from the larger.

FIG. 10 is a flowchart showing an example of the hand position detection process in step 602 of FIG. First, the extraction unit 212 sets 0 to the control variable i indicating the i-th candidate area among the candidate areas indicated by the area information 433 (step 1001), and compares i with nHands (step 1002). When i is less than nHands (step 1002, YES), the line segment detection unit 421 performs a line segment detection process for the i-th candidate region (step 1003).

Next, the determination unit 422 performs parameter calculation processing (step 1004), performs bending determination processing (step 1005), and performs length determination processing (step 1006). Then, the specifying unit 213 performs position specifying processing (step 1007).

Next, the extraction unit 212 increments i by 1 (step 1008), and repeats the processing after step 1002. When i reaches nHands (step 1002, NO), the extraction unit 212 ends the process.

FIG. 11 is a flowchart showing an example of line segment detection processing in step 1003 of FIG. First, the line segment detection unit 421 thins the i-th candidate region (step 1101). For example, the line segment detection unit 421 can thin the candidate region using a thinning algorithm such as the Tamura method, the Zhang-Suen method, or the NWG method described in Non-Patent Document 1. When a branch occurs during thinning, the line segment detection unit 421 generates a single curve composed of an array of consecutive points, leaving only the longest branch of the plurality of branches.

Next, the line segment detector 421 approximates a curve with a plurality of connected line segments, and generates line segment information 434 indicating these line segments (step 1102). For example, the line segment detection unit 421 uses the Ramer-Douglas-Peucker algorithm described in Non-Patent Document 2 to convert a curve into a plurality of approximate line segments, and allows a predetermined tolerance for the deviation between the curve and the approximate line segments. It can be within the range of error.

According to such a line segment detection process, since the candidate area is converted into a combination of connected line segments, it is possible to determine the correspondence between each line segment and the upper limb, hand, object, etc. by a simple determination process. It becomes possible.

FIG. 12 shows an example of line segment information 434 in an array format. The array number is identification information indicating one end point of both ends of each line segment in the binary image, and the x coordinate and the y coordinate are coordinate points (x, y) common to the binary image and the distance image 432. ). The point corresponding to sequence number 0 is located at the lower end of the candidate region, and the corresponding point moves away from the lower end as the sequence number increases. The distance value Z represents a pixel value corresponding to the coordinates (x, y) of the distance image 432.

The line segment information 434 in FIG. 12 represents the connected n line segments, and both ends of the j-th (j = 0 to n−1) line segment correspond to the point corresponding to the array element number j and the array element number j + 1. 2 points corresponding to. The j-th line segment and the j + 1-th line segment are adjacent to each other and are connected at a point corresponding to the array element number j + 1. In this case, the line segment detection unit 421 sets n + 1 to a variable nPts that represents the number of end points.

FIG. 13 is a flowchart showing an example of parameter calculation processing in step 1004 of FIG. First, the determination unit 422 sets 0 to the control variable j representing the array element number of the line segment information 434 (step 1301), and compares j with nPts−1 (step 1302). When j is less than nPts−1 (step 1302, YES), the determination unit 422 determines the length len in the three-dimensional space of the j-th line segment having both ends corresponding to the array element number j and the array element number j + 1. _j is calculated (step 1303).

For example, the determination unit 422 can obtain the length len _j by a pinhole camera model using the coordinates (x, y) of two points corresponding to the array element number j and the array element number j + 1. When the pinhole camera model is used, the length len1 of the line segment in the three-dimensional space corresponding to the line segment between the point (x1, y1) and the point (x2, y2) in the binary image is Calculated by the formula.

X1 = (Z1 × (x1−cx)) / fx (1)
Y1 = (Z1 × (y1-cy)) / fy (2)
X2 = (Z2 × (x2-cx)) / fx (3)
Y2 = (Z2 × (y2-cy)) / fy (4)
len1 = ((X1-X2) ² + (Y1-Y2) ² + (Z1-Z2) ² ) ^1/2 (5)

Z1 and Z2 represent the distance value Z of the point (x1, y1) and the point (x2, y2), respectively, and (cx, cy) represents the coordinates of the principal point in the binary image. For example, the center of the binary image is used as the principal point. fx and fy are focal lengths expressed in units of pixels in the x-axis direction and the y-axis direction, respectively. The origin of the coordinate system representing the coordinates (X, Y, Z) in the three-dimensional space may be the installation position of the imaging device 401.

Next, the determination unit 422 compares j with nPts-2 (step 1304). When j is less than nPts−2 (step 1304, YES), the determination unit 422 determines that the jth line segment and the j + 1th line segment having both ends corresponding to the array element number j + 1 and the array element number j + 2 are both ends. calculating the angle theta _j between (step 1305).

For example, the determination unit 422 calculates the angle θ by calculating the inner product described in Non-Patent Document 3 using the coordinates (x, y) of three points corresponding to the array element number j, the array element number j + 1, and the array element number j + 2. _j can be obtained.

Next, the determination unit 422 increments j by 1 (step 1306), and repeats the processing after step 1302. If j reaches nPts−2 (step 1304, NO), the determination unit 422 skips the process of step 1305 and repeats the processes after step 1306. Next, when j reaches nPts−1 (step 1302, NO), the determination unit 422 ends the process.

FIG. 14 is a flowchart illustrating an example of the bending determination process in step 1005 of FIG. In the bending determination process, a line segment estimated to exist before the hand is excluded based on the determination result for the angle θ _j of the line segment.

From the arrangement of joints in the human body, there are two bent places between the upper arm and the hand, the elbow and the wrist. Therefore, the maximum number of folds between the lower end and the hand of the candidate area where the arm and hand are shown is two, and the third and subsequent folds are bent by fingers or bent by an object other than the hand. It is estimated that. Even when the number of bends is two or less, if the angle θ _j is smaller than the movable angle of the elbow or wrist, it is estimated that the bend is caused by a finger or an object.

First, the determination unit 422 sets the control variable j to 0, sets the variable Nbend representing the number of bends to 0 (step 1401), and compares j with nPts-2 (step 1402). When j is less than nPts−2 (step 1402, YES), the determination unit 422 compares θ _j and θ _min (step 1403). For example, θ _min is non-patent document 4.
Among the movable angles of the joints of the human body described in (1), it can be determined based on the movable angle of the elbow or wrist.

When θ _j is equal to or larger than θ _min (step 1403, YES), the determination unit 422 compares θ _j and θ _max (step 1404). θ _max is a threshold value for determining that there is no bending, and is set to a value larger than θ _min and smaller than 180 °. For example, θ _max may be an angle in the range of 150 ° to 170 °.

If θ _j is less than θ _max (step 1404, NO), the determination unit 422 increments Nbend by 1 (step 1405) and compares Nbend with 2 (step 1406). When Nbend is 2 or less (step 1406, NO), the determination unit 422 increments j by 1 (step 1408), and repeats the processing after step 1402.

When Nbend exceeds 2 (step 1406, YES), the determination unit 422 deletes the points after the array element number j + 2 from the line segment information 434, and changes nPts from n + 1 to j + 2 (step 1407). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434.

When θ _j is less than θ _min (step 1403, NO), the determination unit 422 performs the process of step 1407. Thereby, when θ _j is smaller than the movable angle of the elbow or wrist, the line segment corresponding to the finger or the object is deleted from the line segment information 434.

When θ _j is equal to or _larger than θ _max (step 1404, YES), the determination unit 422 determines that the point of the array element number j + 1 does not correspond to bending, skips the processing of step 1405 and step 1406, and after step 1408 Repeat the process. When j reaches nPts−2 (step 1402, NO), the determination unit 422 ends the process.

According to such a bending determination process, the number of line segments indicated by the line segment information 434 can be reduced, and line segment candidates that approximate the hand area can be narrowed down.

FIG. 15 shows an example of four line segments to be subjected to bending determination processing. pts [j] (j = 0 to 4) represents a point of the array element number j, and θ _j (j = 0 to 2) represents pts [j], p
This represents the angle between line segments calculated from the three points ts [j + 1] and pts [j + 2].

A line segment with both ends of pts [0] and pts [1] corresponds to the upper arm, a line segment with both ends of pts [1] and pts [2] corresponds to the forearm, and pts [1] corresponds to the elbow. To do. In addition, line segments having both ends of pts [2] and pts [3] correspond to the hand, and pts [2] corresponds to the wrist. On the other hand, a line segment having both ends of pts [3] and pts [4] corresponds to an object held by the hand, and pts [3] corresponds to a finger joint.

In this case, when j = 2, Nbend = 3 and Nbend exceeds 2, so pts [4] is deleted from the line segment information 434 and nPts is changed from 5 to 4. Thereby, the line segment corresponding to the object is deleted.

FIG. 16 is a flowchart illustrating an example of the length determination process in step 1006 of FIG. In the length determination process, a line segment that is presumed to exist before the hand is excluded based on the determination result with respect to the angle θ _j of the line segment and the length len _j of the line segment.

First, the determination unit 422 sets the control variable j to 0, sets the variable len representing the length in the three-dimensional space to 0, and sets the variable ctr representing the number of line segments to 0 (step 1601). , J and nPts−1 are compared (step 1602). When j is less than nPts−1 (step 1602, YES), the determination unit 422 adds len _j to len (step 1603) and compares θ _j with θ _max (step 1604). When θ _j is less than θ _max (step 1604, NO), the determination unit 422 compares len and len _max (
Step 1605).

len _max is an upper limit value of the length of the forearm, and can be determined based on the length of the forearm described in Non-Patent Document 4, for example. Further, when the height of the worker is known, len _max may be determined from the height based on the Vitruvian human figure of Leonardo da Vinci.

When len is larger than len _max (step 1605, YES), the determination unit 422 deletes the points after the array element number j + 2 from the line segment information 434 and changes nPts to j + 2 (step 1606). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434. Then, the determination unit 422 performs normal line determination processing in steps 1613 to 1620.

When θ _j is equal to or _larger than θ _max (step 1604, YES), the determination unit 422 determines that the point of the array element number j + 1 does not correspond to bending, increments j by 1 (step 1611), and after step 1602 Repeat the process. If j reaches nPts−1 (step 1602, NO), the determination unit 422 performs normal line determination processing in steps 1613 to 1620.

When len is less than or equal to len _max (step 1605, NO), the determination unit 422 determines that l
en and len _min are compared (step 1607). When len is equal to or greater than len _min (step 1607, NO), the determination unit 422 sets len to 0 (step 1612), and performs the processing after step 1611.

When len is less than len _min (step 1607, YES), the determination unit 422
ctr is incremented by 1 (step 1608), and ctr is compared with 2 (step 1609). When ctr is 2 or less (step 1609, NO), the determination unit 422 performs the processing after step 1612.

If ctr exceeds 2 (step 1609, YES), the determination unit 422 deletes the points after the array element number j + 1 from the line segment information 434 and changes nPts to j + 1 (step 1610). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434. Then, the determination unit 422 performs normal line determination processing in steps 1613 to 1620.

FIG. 17 shows an example of four line segments to be subjected to length determination processing. len _j (j = 0
(3) to (3) represent lengths in a three-dimensional space of line segments having both ends of pts [j] and pts [j + 1]. len ₀ corresponds to the length of a part of the upper arm, len ₁ corresponds to the length of the forearm, len ₂ corresponds to the length of the hand, and len ₃ corresponds to the length of the object held by the hand. To do.

First, when j = 0, len = len ₀ . However, θ ₀ is less than θ _max and le
Since n is greater than or equal to len _min and less than or equal to len _max , len is reset to 0. Next, when j = 1, len = len ₁ is satisfied. At this time, since θ ₁ is equal to or greater than θ _max , len
Is not reset.

When j = 2, len = len ₁ + len ₂ is satisfied. At this time, since θ ₂ is less than θ _max and len exceeds len _max , pts [4] is deleted from the line segment information 434, and nPts is changed from 5 to 4. Thereby, the line segment corresponding to the object is deleted.

FIG. 18 shows an example of three line segments to be subjected to length determination processing. len ₀ corresponds to the length of a part of the forearm, len ₁ corresponds to the length of the hand, and len ₂ corresponds to the length of the object held by the hand.

First, when j = 0, len = len ₀ . However, θ ₀ is less than θ _max and le
Since n is less than len _min , ctr is changed to 1 and len is reset to 0.
Next, when j = 1, len = len ₁ is satisfied. However, θ ₁ is less than θ _max and len
Is less than len _min , ctr is changed to 2 and len is reset to 0.

When j = 2, len = len ₂ is satisfied. However, θ ₂ is less than θ _max and l
Since en is less than len _min , ctr is changed to 3. At this time, since ctr exceeds 2, pts [3] is deleted from the line segment information 434, and nPts is changed from 4 to 3. Thereby, the line segment corresponding to the object is deleted.

In the normal line determination processing in step 1613 to step 1620, among the line segments indicated by the arrangement of points remaining in the line segment information 434, each of the plurality of points on the line segment having the furthest distance to the lower end of the candidate area is passed. Two intersection points of the straight line and the outline of the candidate area are obtained. Then, based on the distance in the three-dimensional space corresponding to the distance between the intersections, a portion approximating the hand region is extracted from the line segment. The line segment with the longest distance to the lower end of the candidate area is a line segment having both ends of pts [nPts-2] and pts [nPts-1], and is estimated to be a line segment corresponding to the hand.

First, the determination unit 422 divides a line segment L having both ends of pts [nPts-2] and pts [nPts-1] at an interval of a predetermined number of pixels, so that m pieces (m is 2 on the line segment L). (Integer) is obtained, and a normal line perpendicular to the line segment L is obtained at each point (step 1613). Then, the determination unit 422 calculates an intersection between each of the m normals and the contour of the candidate area. Since the line segment L exists in the candidate area and the contour of the candidate area exists on both sides of the line segment L, two intersections between each normal line and the contour are obtained.

Next, the determination unit 422 sets 0 to a control variable k indicating one normal line (step 1614), and compares k and m (step 1615). The normal line corresponding to k = 0 intersects with the line segment L at the lower end of the line segment L, and the corresponding normal line moves away from the lower end as k increases.

When k is less than m (step 1615, YES), the determination unit 422 calculates the distance between the two intersections of the kth normal line and the contour, and the length in the three-dimensional space corresponding to the calculated distance. n_len _k to calculate the (step 1616). Then, the determination unit 422 compares n_len _k and n_len _max (step 1617). n_len _max is an upper limit value of the hand width, and may be determined based on the hand width described in Non-Patent Document 4, for example, or may be determined based on a Vitruvian human figure.

When n_len _k is equal to or smaller than n_len _max (step 1617, NO), the determination unit 422 compares n_len _k and n_len _min (step 1618). n_len _min
Is a lower limit value of the hand width, and may be determined based on the hand width described in Non-Patent Document 4, for example, or may be determined based on a Vitruvian human figure.

When n_len _k is greater than or equal to n_len _min (step 1618, NO), the determination unit 422 increments k by 1 (step 1620), and repeats the processing after step 1615. When k reaches m (step 1615, NO), the determination unit 422 ends the process.

When n_len _k is larger than n_len _max (step 1617, YES), the determination unit 422 obtains an intersection between the k-th normal and the line segment L, and records the obtained intersection as pts [nPts−1] ′. (Step 1619). As a result, a portion from the point pts [nPts-2] to the point pts [nPts-1] ′ on the line segment L is extracted as a portion approximating the hand region. Then, the determination unit 422 generates hand region line segment information 435 including the x coordinate, the y coordinate, and the distance value Z of pts [nPts-2] and pts [nPts-1] ′.

also N_len _k is less than n_len _min (step 1618, YES), the determination unit 422 performs the process of step 1619.

FIG. 19 shows an example of a line segment to be subjected to normal line determination processing. In this example, line segments having both ends of pts [2] and pts [3] are divided at a predetermined interval to generate five normal lines. Therefore, m = 5.

First, when k = 0, since n_len ₀ is not less than n_len _min and not more than n_len _max , k is incremented. Next, when k = 1, since n_len ₁ is not less than n_len _min and not more than n_len _max , k is incremented. Next, when k = 2, since n_len ₂ is n_len _min or more and n_len _max or less, k is incremented.

When k = 3, since n_len ₃ exceeds n_len _max , the intersection point pts [3] ′ on the line segment L is obtained. As a result, a portion from the point pts [2] to the point pts [3] ′ is extracted as a portion approximating the hand region.

According to the length determination process in FIG. 16, a line segment corresponding to an object that has not been deleted by the bending determination process can be deleted, and a line segment approximating the hand region is specified. And the part which approximates a hand area on the specified line segment can be specified.

FIG. 20 is a flowchart showing an example of the position specifying process in step 1007 of FIG. In the position specifying process, the center position of the palm is specified by obtaining the maximum inscribed circle inscribed in the outline of the candidate area in the hand area corresponding to the line segment indicated by the hand area line segment information 435.

Since the width of the palm is wider than the width of the closed finger, the inscribed circle at the center of the palm is considered to be larger than the inscribed circle in the closed finger area. Further, since the width of the palm is wider than the width of the wrist, the inscribed circle at the center of the palm is considered to be larger than the inscribed circle in the region of the wrist. Therefore, the center of the maximum inscribed circle in the hand region can be regarded as the center of the palm.

First, the specifying unit 213 sets a scanning range between the points pts [nPts-2] and the points pts [nPts-1] ′, and is inscribed in the outline of the candidate area around each scanning point in the scanning range. A tangent circle is obtained (step 2001). Then, the specifying unit 213 obtains the coordinates (xp, yp) of the center of the largest inscribed circle among the inscribed circles centered on each of the plurality of scanning points. The scanning range may be a line segment having points pts [nPts-2] and pts [nPts-1] ′ as both ends, and is an area in which the line segment is expanded by a predetermined number of pixels in the x direction and the y direction. There may be.

For example, the specifying unit 213 obtains the minimum value d of the distance from each scanning point to the contour of the candidate area, and obtains the coordinates (xmax, ymax) of the scanning point at which the minimum value d is maximum. Then, the specifying unit 213 records the coordinates (xmax, ymax) as the coordinates (xp, yp) of the center of the maximum inscribed circle. In this case, the minimum distance d at the scanning point (xmax, ymax) is the radius of the maximum inscribed circle.

FIG. 21 shows an example of the maximum inscribed circle. In this example, a line segment 2101 having both ends of the points pts [nPts-2] and pts [nPts-1] ′ is set as the scanning range, and the maximum inscribed circle 2103 centering on the point 2102 on the line segment 2101 is set. Is required.

Next, the specifying unit 213 obtains a point in the three-dimensional space corresponding to the center of the maximum inscribed circle, and generates position information 436 indicating the position of the obtained point (step 2002). For example, the specifying unit 213 obtains the coordinates (Xp, Yp, Zp) of the point in the three-dimensional space using the pinhole camera model, using the coordinates (xp, yp) of the center of the maximum inscribed circle and the distance value Zp. be able to. In this case, Xp and Yp are calculated by the following equations.

Xp = (Zp × (xp−cx)) / fx (11)
Yp = (Zp × (yp−cy)) / fy (12)

FIG. 22 shows a second specific example of the image processing apparatus 201 of FIG. The image processing apparatus 201 in FIG. 22 has a configuration in which a detection unit 2201 and an upper limb detection unit 2202 are added to the image processing apparatus 201 in FIG. 4, and obtains the position of the hand and the upper limb from the distance image 432.

FIG. 23 shows an example of the visual field range of the imaging device 401 of FIG. The imaging apparatus 401 in FIG. 23 is installed above the workbench 102, and the visual field range 2301 is gripped by the workbench 102, the upper body 2302 including the left hand 502 and the right hand 503 of the worker 101, and the right hand 503. The object 2303 is included. Therefore, the work table 102, the upper body 2302, and the object 2303 are shown in the distance image 432 captured by the imaging device 401.

The XW axis, YW axis, and ZW axis represent the world coordinate system, and the origin O is provided on the floor of the work place. The XW axis is provided in parallel with the long side of the work table 102 and represents the direction from the left shoulder to the right shoulder of the worker 101. The YW axis is provided in parallel with the short side of the work table 102 and represents the direction from the front to the back of the worker 101. The ZW axis is provided perpendicular to the floor surface and represents a direction from the floor surface toward the top of the worker 101.

The detection unit 2201 detects a body region from the distance image 432 using the background distance image 431 and stores region information 2211 indicating the detected body region in the storage unit 412. The body region is a region where the upper body 2302 of the worker 101 is estimated to be captured.

The upper limb detection unit 2202 identifies the position of the upper limb including the wrist, elbow, or shoulder using the distance image 432, the region information 2211, and the center position of the palm, and the position information 2212 indicating the identified position is stored in the storage unit 412. To store. The output unit 411 outputs a recognition result based on the position information 436 and the position information 2212. The recognition result may be a trajectory indicating a change in the three-dimensional position of the hand and the upper limb, or may be information indicating an operator's motion estimated from the trajectory of the hand and the upper limb.

According to the image processing apparatus 201 in FIG. 22, it is possible to recognize the movement of the worker in consideration of not only the movement of the hand but also the movement of the upper limb, and the recognition accuracy of the movement of the worker is further improved.

FIG. 24 is a flowchart illustrating a specific example of image processing performed by the image processing apparatus 201 in FIG. In this image processing, the distance image 432 is divided into two areas, a hand detection area and a body detection area.

FIG. 25 shows an example of a distance image 432 and a background distance image 431 obtained by photographing the visual field range 2301 of FIG. FIG. 25A shows an example of the distance image 432, and FIG. 25B shows an example of the background distance image 431. The distance image 432 is divided into a hand detection area 2501 and a body detection area 2502, and the background distance image 431 is similarly divided into two areas.

First, the detection unit 2201 performs body region detection processing using the body detection region of the distance image 432 (step 2401), and the detection unit 211 performs candidate region detection processing using the hand detection region of the distance image 432 (step 2401). Step 2402). Next, the extraction unit 212 and the specifying unit 213 perform hand position detection processing (step 2403), and the upper limb detection unit 2202 performs upper limb position detection processing (step 2404).

FIG. 26 is a flowchart showing an example of body region detection processing in step 2401 of FIG. First, the detection unit 2201 subtracts the pixel value of each pixel in the body detection area of the background distance image 431 from the pixel value of each pixel in the body detection area of the distance image 432, generates a difference image, and generates the generated difference. The image is binarized (step 2601).

Next, the detection unit 2201 performs opening and closing for each pixel of the binary image (step 2602). Then, the detection unit 2201 extracts, as a body region, a white region having the maximum area among the white regions included in the binary image (step 2603), and obtains the center of gravity of the body region (step 2604). For example, the detection unit 2201 can obtain the coordinates of the center of gravity of the body region by calculating the average value of the x and y coordinates of each of the plurality of pixels included in the body region.

Next, the detection unit 2201 obtains the position of the head shown in the body region (step 2505). For example, the detection unit 2201 generates a histogram of distance values for each of a plurality of pixels included in the body region, and a threshold value such that the number of pixels having a distance value equal to or less than the threshold value THD is greater than or equal to a predetermined number from the generated histogram. Determine the THD. Next, the detection unit 2201 selects the maximum region as a head region among regions composed of pixels having a distance value equal to or greater than the threshold value THD, and obtains the coordinates of the center of gravity of the head region as the head position. Then, the detection unit 2201 generates region information 2211 indicating the coordinates of the center of gravity of the body region and the head region.

24, the detection unit 211 performs a candidate area detection process similar to that of FIG. 7 using the hand detection areas of the distance image 432 and the background distance image 431.

FIG. 27 is a flowchart showing an example of the hand position detection process in step 2403 of FIG. The processing in Step 2701 to Step 2706, Step 2708, and Step 2709 in FIG. 27 is the same as the processing in Step 1001 to Step 1008 in FIG. The determination unit 422 performs a three-dimensional direction determination process after performing the length determination process (step 2707).

FIG. 28 is a flowchart showing an example of the three-dimensional direction determination process in step 2707 of FIG. In the three-dimensional direction determination process, based on the result of comparing the direction in the three-dimensional space of each line segment and the direction in the three-dimensional space of the subject reflected in the area including the line segment, Presumed line segments are excluded.

First, the determination unit 422 sets 0 to the control variable j (step 2801), and compares j with nPts−1 (step 2802). When j is less than nPts−1 (step 2802, YES), the determination unit 422 determines the coordinates (X _j ,
Y _j , Z _j ) and coordinates (X _{j + 1} , Y _{j + 1} , Z _{j + 1} ) in the three-dimensional space of pts [j + 1] are obtained (step 2803). For example, the determination unit 422 can obtain (X _j , Y _j , Z _j ) and (X _{j + 1} , Y _{j + 1} , Z _{j + 1} ) using a pinhole camera model.

Next, the determination unit 422 uses (X _j , Y _j , Z _j ) and (X _{j + 1} , Y _{j + 1} , Z _{j + 1} ) to convert pts [j] and pts [j + 1] to both ends. A direction vector V _j indicating the direction of the line segment L _{j in} the three-dimensional space is obtained (step 2804). For example, the determination unit 422 can determine a vector from (X _j , Y _j , Z _j ) to (X _{j + 1} , Y _{j + 1} , Z _{j + 1} ) in the three-dimensional space as V _j. . The determination unit 422 may obtain the coordinates in the three-dimensional space of each of a plurality of points on the line segment L _j and obtain a vector that approximates a curve connecting these coordinates as V _j .

Next, the determination unit 422 sets a peripheral region of the line segment L _j in the candidate region, obtains coordinates in a three-dimensional space of each of a plurality of points in the peripheral region, and performs principal component analysis on these coordinates ( Step 2805). Then, the determination unit 422 obtains the first principal component vector EV _j from the result of the principal component analysis (step 2806). EV _j indicates the direction in the three-dimensional space of the subject shown in the area including the line segment L _j .

FIG. 29 shows an example of the peripheral region in the case of j = 3. In this case, the line segment L ₃ is pts.
[3] and pts [4] are both line segments. For example, the determination unit 422 obtains a normal line passing through pts [3] and a normal line passing through pts [4], and an area 2901 surrounded by the outlines of the two normal lines and the candidate area is obtained as a line segment L It can be set as ₃ peripheral areas.

Next, the determination unit 422 obtains an angle γ _j between V _j and EV _j (step 2807), and compares γ _j with a threshold γ _th (step 2808). When γ _j is less than γ _th (step 28)
(08, NO), the determination unit 422 increments j by 1 (step 2810), and repeats the processing after step 2802. If j reaches nPts−1 (step 2802, NO), the determination unit 422 ends the process.

When γ _j is larger than γ _th (step 2808, YES), the determination unit 422 deletes the points after the array element number j + 1 from the line segment information 434 and changes nPts to j + 1 (step 2809). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434.

According to such a three-dimensional direction determination process, it is possible to delete a line segment corresponding to the object that has not been deleted by the bending determination process and the length determination process.

FIG. 30 shows an example of the first principal component vectors EV ₀ to EV ₃ . EV ₀ to EV ₂ are obtained by principal component analysis on the peripheral regions of the line segments L ₀ to L ₂ , and correspond to the upper arm, forearm, and hand regions, respectively. In this case, the directions of EV ₀ to EV ₁ are close to the directions of the direction vectors V ₀ to V _{2 of} L ₀ to L ₂ .

On the other hand, the direction of EV ₃ corresponding to the region of the object 3001 hand is holding is different significantly from the direction of the direction vector V ₃ of the line segment L _3, the angle gamma ₃ between V ₃ and EV ₃ It becomes larger than γ _th . Therefore, pts [4] is deleted from the line segment information 434, and nPts is changed from 5 to 4. Thereby, the line segment corresponding to the object 3001 is deleted.

FIG. 31 is a flowchart showing an example of the upper limb position detection process in step 2404 of FIG. First, the upper limb detection unit 2202 obtains the position of the center of gravity of the head region indicated by the region information 2211 in the world coordinate system (step 3101). For example, the upper limb detection unit 2202 can obtain coordinates in a three-dimensional space using a pinhole camera model using the coordinates of the center of gravity of the head region and the distance value.

Then, the upper limb detection unit 2202 converts the obtained coordinates into coordinates (XWH, YWH, ZWH) in the world coordinate system illustrated in FIG. 23 using an RT matrix based on the external parameters of the imaging device 401. The external parameters of the imaging device 401 include the height from the floor surface to the installation position of the imaging device 401 and the tilt angle of the imaging device 401, and the RT matrix includes the three-dimensional coordinate system and the world coordinates with the imaging device 401 as the origin. Represents rotation and translation between systems. The obtained ZWH represents the height from the floor surface to the top of the head, and corresponds to the approximate height of the operator.

Next, the upper limb detection unit 2202 obtains coordinates in the three-dimensional space of the center of gravity of the body area indicated by the area information 2211 and converts the obtained coordinates into coordinates (XWB, YWB, ZWB) in the world coordinate system using the RT matrix. (Step 3102).

Next, the upper limb detection unit 2202 obtains the approximate position of both shoulders in the world coordinate system using ZWH as the height (step 3103). For example, the upper limb detection unit 2202 determines the ratio of the height of both shoulders to the height and the ratio of the shoulder width to the height based on the Vitruvian human figure, and the height of the left shoulder (ZW coordinates) ZWLS, right The shoulder height ZWRS and the shoulder width SW can be obtained. Then, the upper limb detection unit 2202 obtains XWB−SW / 2 as the lateral position (XW coordinate) XWLS of the left shoulder, obtains XWB + SW / 2 as the lateral position XWRS of the right shoulder, and determines YWB as the YW coordinates of the left shoulder and the right shoulder. To do.

The upper limb detection unit 2202 may use XWH and YWH instead of XWB and YWB.

Next, the upper limb detection unit 2202 sets 0 to the control variable i indicating the i-th candidate region (step 3104), and compares i with nHands (step 3105). When i is less than nHands (step 3105, YES), the upper limb detection unit 2202 obtains the wrist position based on the palm center position obtained by the specifying unit 213 in the i-th candidate region (step 3106). . Then, the upper limb detection unit 2202 converts the coordinates of the wrist position into the coordinates of the world coordinate system using the RT matrix.

For example, the upper limb detection unit 2202 is present on the side closer to the lower end (body side) of the candidate region than the center position of the palm among pts [0] to pts [nPts-1] indicated by the line segment information 434, and The point closest to the center position can be determined as the wrist position. The upper limb detection unit 2202 may determine a point on the line segment that exists at a position away from the center position of the palm by a predetermined distance as the position of the wrist.

FIG. 32 shows an example of the position of the wrist. In this example, among pts [0] to pts [2] existing closer to the lower end of the candidate region than the palm center position 3201, pts [2] closest to the palm center position 3201 is the wrist position. To be determined.

Next, the upper limb detection unit 2202 obtains the elbow position based on the wrist position in the i-th candidate region, and converts the elbow position coordinates into coordinates in the world coordinate system using the RT matrix (step 3107). ).

For example, the upper limb detection unit 2202 determines the ratio of the forearm length to the height based on the Vitruvian human figure, obtains the forearm length from the ZWH, and calculates the length of the forearm on the image in the candidate region. Can be converted. The upper limb detection unit 2202 obtains a point separated from the wrist position by the length of the forearm on the image toward the side closer to the lower end of the candidate region, and among the pts [0] to pts [nPts−1] Then, a point existing within a predetermined error range from the obtained point is determined as the elbow position. For example, in the example of FIG. 32, pts [1] is determined as the elbow position.

When the point indicated by the line segment information 434 does not exist within the predetermined error range, the upper limb detection unit 2202 includes the circle centered on the wrist position and the forearm length on the image as a radius, and the line segment information 434 An intersection with the line segment to be shown may be obtained, and the obtained intersection may be determined as the elbow position. When the elbow is not shown in the hand detection area of the distance image 432 and there is no intersection between the circle and the line segment, the upper limb detection unit 2202 is on the extension line of the line segment connecting the wrist position and pts [0]. A point separated from the wrist position by the length of the forearm on the image is determined as the elbow position.

Next, the upper limb detection unit 2202 corrects the position of the shoulder based on the position of the elbow in the world coordinate system (step 3108). For example, the upper limb detection unit 2202 can determine the ratio of the length of the upper arm to the height based on the Vitruvian human figure, and can determine the length of the upper arm from the ZWH. Then, the upper limb detection unit 2202 obtains a three-dimensional distance UALen between the elbow coordinates and the shoulder coordinates in the world coordinate system.

When UALen exceeds the length of the upper arm, the upper limb detection unit 2202 moves the position of the shoulder on the three-dimensional straight line connecting the elbow coordinates and the shoulder coordinates so that UAlen matches the length of the upper arm. . On the other hand, when UALen is equal to or less than the length of the upper arm, the upper limb detection unit 2202 does not correct the coordinates of the shoulder. Then, the upper limb detection unit 2202 generates position information 2212 indicating the positions of the wrist, elbow, and shoulder in the world coordinate system.

Next, the upper limb detection unit 2202 increments i by 1 (step 3109), and repeats the processing after step 3105. When i reaches nHands (step 3105, NO), the upper limb detection unit 2202 ends the process.

In Step 3103, Step 3107, and Step 3108, the upper limb detection unit 2202 may use a predetermined value set in advance as the height instead of ZWH.

FIG. 33 shows a functional configuration example of an image processing system including the image processing apparatus 201 of FIG. 4 or FIG. The image processing system in FIG. 33 includes an image processing device 201 and an image processing device 3301.

A transmission unit 3311 in the image processing apparatus 201 corresponds to the output unit 411 in FIG. 4 or FIG. 22, and transmits a recognition result based on the position information 436 or a recognition result based on the position information 436 and the position information 2212 via a communication network. To the image processing apparatus 3301. For example, the image processing apparatus 201 generates a plurality of recognition results in each of a plurality of time zones, and transmits the recognition results to the image processing apparatus 3301 in time series.

The image processing device 3301 includes a reception unit 3321, a display unit 3322, and a storage unit 3323. The receiving unit 3321 receives a plurality of recognition results in time series from the image processing apparatus 201, and the storage unit 3323 stores the received plurality of recognition results in association with each of a plurality of time zones. The display unit 3322 displays the time-series recognition results stored in the storage unit 3323 on the screen.

The image processing apparatus 201 may be a server installed at a work site such as a factory, or may be a server on the cloud that communicates with the imaging apparatus 401 via a communication network. The image processing device 3301 may be a server or a terminal device of an administrator who monitors the operation of the worker.

Note that the functions of the image processing apparatus 201 in FIG. 4 or 22 can be distributed and implemented in a plurality of apparatuses connected via a communication network. For example, the detection unit 211, the extraction unit 212, the specifying unit 213, the detection unit 2201, and the upper limb detection unit 2202 may be provided in different devices.

The configuration of the image processing apparatus 201 in FIGS. 2, 4, and 22 is merely an example, and some components may be omitted or changed according to the use or conditions of the image processing apparatus 201.

The configuration of the image processing system in FIG. 33 is merely an example, and some components may be omitted or changed according to the use or conditions of the image processing system.

3, 6, 7, 10, 11, 13, 14, 16, 20, 24, 26 to 28, and 31 are merely examples, and the image processing apparatus Depending on the configuration or conditions of 201, some processes may be omitted or changed.

For example, in the candidate area detection process of FIG. 7, when the process is simplified, the processes of Step 702, Step 703, and Step 705 can be omitted. In step 704, the detection unit 211 may select a white region that is in contact with the upper end, the left end, or the right end of the binary image as a candidate region. In the hand position detection process of FIG. 10, either the bending determination process in step 1005 or the length determination process in step 1006 may be omitted.

In the length determination process in FIG. 16, when the process is simplified, the normal determination process in steps 1613 to 1620 can be omitted. In the position specifying process of FIG. 20, the specifying unit 213 may determine the midpoint of the line segment indicated by the hand region line segment information 435 as the center position of the palm instead of the center of the maximum inscribed circle.

In the body region detection process of FIG. 26, when the process is simplified, the process of step 2602 can be omitted. In the hand position detection process of FIG. 27, either the bending determination process in step 2705 or the length determination process in step 2706 may be omitted. In the three-dimensional direction determination process of FIG. 28, the determination unit 422 may determine a vector indicating the direction in the three-dimensional space of the subject shown in the region including the line segment L _j by a method other than the principal component analysis.

In the upper limb position detection process of FIG. 31, the upper limb detection unit 2202 does not need to obtain all positions of the shoulder, wrist, and elbow. For example, the upper limb detection unit 2202 may generate position information 2212 indicating the position of any one of a shoulder, a wrist, and an elbow.

The installation position of the three-dimensional distance sensor in FIG. 1 and the installation position of the imaging device in FIGS. 5 and 23 are merely examples, and the three-dimensional distance sensor or the imaging device is installed at a position where the operator can be photographed from another angle. May be.

8 and 25 are merely examples, and the distance image and the background distance image change according to the subject existing in the visual field range of the imaging apparatus. The hand detection area and the body detection area in FIG. 25 are merely examples, and a hand detection area and a body detection area having different positions or different shapes may be used.

The line segment information in FIG. 12 and the line segments in FIGS. 15, 17 to 19, and 32 are merely examples, and the line segment information and the line segments change according to the captured distance image. The maximum inscribed circle in FIG. 21, the peripheral region in FIG. 29, and the vector of the first principal component in FIG. 30 are only examples, and the vector of the maximum inscribed circle, the peripheral region, and the first principal component were taken. It changes according to the distance image. Other shaped peripheral regions may be used.

FIG. 34 shows a configuration example of an information processing apparatus (computer) used as the image processing apparatus 201 in FIGS. 2, 4, and 22. 34 includes a central processing unit (CPU) 3401, a memory 3402, an input device 3403, an output device 3404,
An auxiliary storage device 3405, a medium driving device 3406, and a network connection device 3407 are provided. These components are connected to each other by a bus 3408. The imaging device 401 may be connected to the bus 3408.

The memory 3402 is a semiconductor memory such as a Read Only Memory (ROM), a Random Access Memory (RAM), and a flash memory, and stores programs and data used for processing. The memory 3402 can be used as the storage unit 412.

For example, the CPU 3401 (processor) executes a program using the memory 3402 to detect the detection unit 211, the extraction unit 212, the specifying unit 213, the line segment detection unit 421, the determination unit 422, the detection unit 2201, and the upper limb detection. The unit 2202 operates.

The input device 3403 is, for example, a keyboard, a pointing device, etc., and is used for inputting an instruction or information from an operator or a user. The output device 3404 is, for example, a display device, a printer, a speaker, or the like, and is used to output an inquiry or processing result to the operator or the user. The processing result may be a recognition result based on the position information 436 or a recognition result based on the position information 436 and the position information 2212. The output device 3404 can be used as the output unit 411.

The auxiliary storage device 3405 is, for example, a magnetic disk device, an optical disk device, a magneto-optical disk device, a tape device, or the like. The auxiliary storage device 3405 may be a flash memory or a hard disk drive. The information processing apparatus can store a program and data in the auxiliary storage device 3405 and load them into the memory 3402 for use. The auxiliary storage device 3405 can be used as the storage unit 412.

The medium driving device 3406 drives a portable recording medium 3409 and accesses the recorded contents. The portable recording medium 3409 is a memory device, a flexible disk, an optical disk, a magneto-optical disk, or the like. The portable recording medium 3409 includes Compact Disk Read Only Memory (CD-ROM), Digital Versatile Disk (DVD), Universal Serial Bus (US).
B) It may be a memory or the like. An operator or user can store programs and data in the portable recording medium 3409 and load them into the memory 3402 for use.

As described above, the computer-readable recording medium for storing the program and data used for the processing is the memory 3402, the auxiliary storage device 3405, or the portable recording medium 340.
9 is a physical (non-temporary) recording medium.

The network connection device 3407 is a communication interface that is connected to a communication network such as a local area network or a wide area network, and performs data conversion accompanying communication. The information processing apparatus can receive a program and data from an external apparatus via the network connection apparatus 3407, and can use them by loading them into the memory 3402. The network connection device 3407 can be used as the output unit 411 or the transmission unit 3311.

Note that the information processing apparatus does not have to include all the components shown in FIG. 34, and some components may be omitted depending on the application or conditions. For example, if it is not necessary to input an instruction or information from an operator or user, the input device 3403 may be omitted. When the portable recording medium 3409 or the communication network is not used, the medium driving device 3406 or the network connection device 3407 may be omitted.

The information processing apparatus in FIG. 34 can also be used as the image processing apparatus 3301 in FIG. In this case, the network connection device 3407 is used as the reception unit 3321, the memory 3402 or the auxiliary storage device 3405 is used as the storage unit 3323, and the output device 3404 is used as the display unit 3322.

Although the disclosed embodiments and their advantages have been described in detail, those skilled in the art will be able to make various changes, additions and omissions without departing from the scope of the invention as set forth in the appended claims. .

With respect to the embodiment described with reference to FIGS. 1 to 34, the following additional notes are disclosed.
(Appendix 1)
A detection unit that detects a candidate area in which a hand is captured from a distance image;
An extraction unit that extracts a line segment that approximates a hand region from the plurality of line segments based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region;
Using a line segment that approximates the hand region, a specifying unit that specifies the position of the hand in the candidate region;
An image processing apparatus comprising:
(Appendix 2)
The detection unit detects a region in contact with the end of the distance image as the candidate region, and the extraction unit determines that the distance to the end is the two lines when the angle is smaller than a first threshold. A line segment that is farther than the minute is excluded from line segment candidates that approximate the hand region, and a line segment that is the farthest to the end of the remaining line segments is a line segment that approximates the hand region. The image processing apparatus according to appendix 1, wherein the image processing apparatus is extracted as:
(Appendix 3)
In the extraction unit, a first line segment of the two line segments is in contact with the end part, and a second line segment of the two line segments is adjacent to a third line segment, The third line segment is adjacent to the fourth line segment, and an angle between the first line segment and the second line segment is the first threshold value and a second threshold value that is larger than the first threshold value. An angle range between the second line segment and the third line segment is included in the angle range, and an angle between the third line segment and the fourth line segment is the angle range. The image processing apparatus according to appendix 2, wherein the fourth line segment is excluded from line segment candidates that approximate the hand region.
(Appendix 4)
When the angle is smaller than a second threshold value that is larger than the first threshold value, the extraction unit corresponds to a line segment having a shorter distance to the end portion among the two line segments. Using the length of the line segment, it is determined whether to exclude the line segment that is farther from the end of the two line segments from the line segment candidates that approximate the hand region. The image processing apparatus according to appendix 2, wherein:
(Appendix 5)
The extraction unit obtains two intersections between a straight line passing through each of a plurality of points on a line segment having the longest distance to the end and the contour of the candidate area, and corresponds to the distance between the two intersections. Then, based on the distance in the three-dimensional space, the part that approximates the hand region is extracted from the line segment that is the farthest to the end, and the specifying unit uses the part that approximates the hand region. The image processing apparatus according to

appendix

3 or 4, wherein the position of the hand is specified.
(Appendix 6)
The extraction unit is configured to detect the three-dimensional object captured in a region including a direction in the three-dimensional space of a line segment that is farther to the end and a line segment that is farther to the end. If the angle between the two directions is greater than the third threshold value, the line segment that is farther to the end is excluded from the line segment candidates that approximate the hand region. The image processing apparatus according to any one of appendices 3 to 5, wherein
(Appendix 7)
The extraction unit obtains a curve by thinning the candidate area, and obtains a plurality of line segments approximating the curve as the plurality of line segments approximating the candidate area. The image processing apparatus according to any one of claims 6 to 6.
(Appendix 8)
The image processing apparatus according to any one of appendices 1 to 7, further comprising an upper limb detection unit that obtains a position of a wrist, an elbow, or a shoulder using the distance image and the position of the hand.
(Appendix 9)
9. The image processing device according to any one of appendices 2 to 8, wherein the first threshold value represents a lower limit value of a bending angle of a wrist or elbow joint.
(Appendix 10)
A detection unit that detects a candidate area in which a hand is captured from a distance image;
An extraction unit that extracts a line segment that approximates a hand region from the plurality of line segments based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region;
Using a line segment that approximates the hand region, a specifying unit that specifies the position of the hand in the candidate region;
A display unit for displaying position information indicating the position of the hand;
An image processing system comprising:
(Appendix 11)
From the distance image, detect the candidate area where the hand is reflected,
Based on the angle between two adjacent line segments out of a plurality of line segments approximating the candidate area, a line segment approximating a hand area is extracted from the plurality of line segments,
Identifying the position of the hand in the candidate region using a line segment approximating the hand region;
An image processing program for causing a computer to execute processing.
(Appendix 12)
The computer detects a region that is in contact with an end of the distance image as the candidate region, and when the angle is smaller than a first threshold, a line segment whose distance to the end is farther than the two line segments. Is extracted from the line segment candidates that approximate the hand region, and the line segment that is the farthest to the end of the remaining line segments is extracted as a line segment that approximates the hand region. The image processing program according to appendix 11.
(Appendix 13)
Computer
From the distance image, detect the candidate area where the hand is reflected,
Based on the angle between two adjacent line segments out of a plurality of line segments approximating the candidate area, a line segment approximating a hand area is extracted from the plurality of line segments,
Identifying the position of the hand in the candidate region using a line segment approximating the hand region;
An image processing method.
(Appendix 14)
The computer detects a region that is in contact with an end of the distance image as the candidate region, and when the angle is smaller than a first threshold, a line segment whose distance to the end is farther than the two line segments. Is extracted from the line segment candidates that approximate the hand region, and the line segment that is the farthest to the end of the remaining line segments is extracted as a line segment that approximates the hand region. The image processing method according to appendix 13.

DESCRIPTION OF SYMBOLS 101 Worker 102

Worktable

103, 2303 Object 111 Three-

dimensional distance sensor

112, 501, 2301, 3001 Field of

view range

201, 3301

Image processing device

211, 2201 Detection unit 212 Extraction unit 213 Identification unit 401 Imaging device 411

Output unit

412, 3323 Storage unit 421 Line segment detection unit 422 Determination unit 431 Background distance image 432

Distance image

433, 2211 Area information 434 Line segment information 435 Hand

area segment information

436, 2212 Position information 502 Left hand 503 Right hand 2101 Line segment 2102 Point 2103 Maximum inscribed Circle 2202 Upper limb detection unit 2302 Upper body 2501 Hand detection region 2502 Body detection region 2901 region 3201 Center position 3311 Transmission unit 3321 Reception unit 3322 Display unit 3401 CPU
3402 Memory 3403 Input device 3404 Output device 3405 Auxiliary storage device 3406 Medium drive device 3407 Network connection device 3408 Bus 3409 Portable recording medium

Claims

A detection unit that detects a candidate area in which a hand is captured from a distance image;
An extraction unit that extracts a line segment that approximates a hand region from the plurality of line segments based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region;
Using a line segment that approximates the hand region, a specifying unit that specifies the position of the hand in the candidate region;
An image processing apparatus comprising:
The detection unit detects a region in contact with the end of the distance image as the candidate region, and the extraction unit determines that the distance to the end is the two lines when the angle is smaller than a first threshold. A line segment that is farther than the minute is excluded from line segment candidates that approximate the hand region, and a line segment that is the farthest to the end of the remaining line segments is a line segment that approximates the hand region. The image processing device according to claim 1, wherein the image processing device is extracted as:
In the extraction unit, a first line segment of the two line segments is in contact with the end part, and a second line segment of the two line segments is adjacent to a third line segment, The third line segment is adjacent to the fourth line segment, and an angle between the first line segment and the second line segment is the first threshold value and a second threshold value that is larger than the first threshold value. An angle range between the second line segment and the third line segment is included in the angle range, and an angle between the third line segment and the fourth line segment is the angle range. The image processing apparatus according to claim 2, wherein the fourth line segment is excluded from line segment candidates that approximate the hand region.
When the angle is smaller than a second threshold value that is larger than the first threshold value, the extraction unit corresponds to a line segment having a shorter distance to the end portion among the two line segments. Using the length of the line segment, it is determined whether to exclude the line segment that is farther from the end of the two line segments from the line segment candidates that approximate the hand region. The image processing apparatus according to claim 2, wherein:
The extraction unit obtains two intersections between a straight line passing through each of a plurality of points on a line segment having the longest distance to the end and the contour of the candidate area, and corresponds to the distance between the two intersections. Then, based on the distance in the three-dimensional space, the part that approximates the hand region is extracted from the line segment that is the farthest to the end, and the specifying unit uses the part that approximates the hand region. 5. The image processing apparatus according to claim 3, wherein the position of the hand is specified.
The extraction unit is configured to detect the three-dimensional object captured in a region including a direction in the three-dimensional space of a line segment that is farther to the end and a line segment that is farther to the end. If the angle between the two directions is greater than the third threshold value, the line segment that is farther to the end is excluded from the line segment candidates that approximate the hand region. The image processing apparatus according to claim 3, wherein the image processing apparatus is an image processing apparatus.
The extraction unit obtains a curve by thinning the candidate area, and obtains a plurality of line segments that approximate the curve as the plurality of line segments that approximate the candidate area. 7. The image processing device according to any one of items 1 to 6.
The image processing apparatus according to claim 1, further comprising an upper limb detection unit that obtains a position of a wrist, an elbow, or a shoulder using the distance image and the position of the hand. .
9. The image processing apparatus according to claim 2, wherein the first threshold value represents a lower limit value of a bending angle of a wrist or elbow joint.
A detection unit that detects a candidate area in which a hand is captured from a distance image;
An extraction unit that extracts a line segment that approximates a hand region from the plurality of line segments based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region;
Using a line segment that approximates the hand region, a specifying unit that specifies the position of the hand in the candidate region;
A display unit for displaying position information indicating the position of the hand;
An image processing system comprising:
From the distance image, detect the candidate area where the hand is reflected,
Based on the angle between two adjacent line segments out of a plurality of line segments approximating the candidate area, a line segment approximating a hand area is extracted from the plurality of line segments,
Identifying the position of the hand in the candidate region using a line segment approximating the hand region;
An image processing program for causing a computer to execute processing.
The computer detects a region that is in contact with an end of the distance image as the candidate region, and when the angle is smaller than a first threshold, a line segment whose distance to the end is farther than the two line segments. Is extracted from the line segment candidates that approximate the hand region, and the line segment that is the farthest to the end of the remaining line segments is extracted as a line segment that approximates the hand region. The image processing program according to claim 11.
Computer
From the distance image, detect the candidate area where the hand is reflected,
Based on the angle between two adjacent line segments out of a plurality of line segments approximating the candidate area, a line segment approximating a hand area is extracted from the plurality of line segments,
Identifying the position of the hand in the candidate region using a line segment approximating the hand region;
An image processing method.
The computer detects a region that is in contact with an end of the distance image as the candidate region, and when the angle is smaller than a first threshold, a line segment whose distance to the end is farther than the two line segments. Is extracted from the line segment candidates that approximate the hand region, and the line segment that is the farthest to the end of the remaining line segments is extracted as a line segment that approximates the hand region. The image processing method according to claim 13.