WO2018135326A1 - Dispositif de traitement d'image, système de traitement d'image, programme de traitement d'image et procédé de traitement d'image - Google Patents
Dispositif de traitement d'image, système de traitement d'image, programme de traitement d'image et procédé de traitement d'image Download PDFInfo
- Publication number
- WO2018135326A1 WO2018135326A1 PCT/JP2018/000120 JP2018000120W WO2018135326A1 WO 2018135326 A1 WO2018135326 A1 WO 2018135326A1 JP 2018000120 W JP2018000120 W JP 2018000120W WO 2018135326 A1 WO2018135326 A1 WO 2018135326A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- line segment
- hand
- region
- image processing
- line
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
Definitions
- the present invention relates to an image processing apparatus, an image processing system, an image processing program, and an image processing method.
- Patent Document 1 In recent years, with the widespread use of three-dimensional distance sensors, many techniques have been developed that detect the user's skeleton, joint positions, and the like from distance images and recognize gestures and other actions based on the detection results (for example, Patent Document 1). And Patent Document 2).
- the distance sensor and the distance image may be referred to as a depth sensor and a depth image, respectively.
- a method of thinning a line of a binarized image, a method of converting continuous points into a plurality of approximate line segments, a method of obtaining an angle composed of three points, a prescribed value of the size of the human body, etc. are also known (for example, Non-Patent Document 1 to Non-Patent Document 4).
- an object of the present invention is to accurately identify the position of a hand from a distance image of a hand holding an object.
- the image processing apparatus includes a detection unit, an extraction unit, and a specification unit.
- the detection unit detects a candidate area where a hand is captured from the distance image.
- the extraction unit extracts a line segment that approximates the hand region from a plurality of line segments that approximate the candidate region, based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region.
- the specifying unit specifies the position of the hand in the candidate area using a line segment that approximates the hand area.
- the position of the hand can be accurately identified from the distance image of the hand holding the object.
- FIG. 1 shows an example of a three-dimensional distance sensor installed in an assembly line.
- the worker 101 performs product assembly work while holding an object 103 such as a tool or a part on the work table 102.
- the field of view 112 of the three-dimensional distance sensor 111 installed above the work table 102 includes the hand of the worker 101, and the three-dimensional distance sensor 111 obtains a distance image showing the hand during work. can do.
- the difference between the distance value of the hand and the distance value of the object 103 in the distance image is obtained. Get smaller. For this reason, it is difficult to separate the hand and the object 103 by the method of separating the foreground and the background based on the distance value. Even if the hand region is detected based on the difference between the captured distance image and the background distance image of only the background, both the hand and the object 103 are detected as the background difference, and therefore both are separated. It is difficult.
- a skeleton of a user's body part is recognized by tracking using machine learning, and a tool held by the user is recognized by tracking in a passive method or active method.
- passive tracking an infrared retro-reflective marker is attached to a tool, and the tool is recognized by detecting the marker with an external device such as a camera.
- active tracking a three-dimensional position sensor, an acceleration sensor, and the like are built in the tool, and these sensors notify the tracking system of the position of the tool.
- FIG. 2 shows a functional configuration example of the image processing apparatus.
- the image processing apparatus 201 in FIG. 2 includes a detection unit 211, an extraction unit 212, and a specification unit 213.
- FIG. 3 is a flowchart illustrating an example of image processing performed by the image processing apparatus 201 in FIG.
- the detection unit 211 detects a candidate area where a hand is shown from the distance image (step 301).
- the extraction unit 212 selects a hand region from a plurality of line segments that approximate the candidate region based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region.
- An approximate line segment is extracted (step 302).
- the specifying unit 213 specifies the position of the hand in the candidate area using a line segment that approximates the hand area (step 303).
- the position of the hand can be accurately identified from the distance image of the hand holding the object.
- FIG. 4 shows a first specific example of the image processing apparatus 201 of FIG.
- the image processing apparatus 201 in FIG. 4 includes a detection unit 211, an extraction unit 212, a specification unit 213, an output unit 411, and a storage unit 412.
- the extraction unit 212 includes a line segment detection unit 421 and a determination unit 422.
- the imaging device 401 is an example of a three-dimensional distance sensor.
- the imaging device 401 captures a distance image 432 including a pixel value representing the distance from the imaging device 401 to the subject and outputs the captured image to the image processing device 201.
- an infrared camera can be used as the imaging device 401.
- FIG. 5 shows an example of the field of view range of the imaging apparatus 401.
- the imaging device 401 in FIG. 5 is installed above the workbench 102, and the visual field range 501 includes the object 103 held by the workbench 102, the left hand 502 and the right hand 503 of the worker 101, and the right hand 503. It is included. Therefore, the work table 102, the left hand 502, the right hand 503, and the object 103 are shown in the distance image 432 captured by the imaging device 401.
- the storage unit 412 stores a background distance image 431 and a distance image 432.
- the background distance image 431 is a distance image captured in advance in a state where the left hand 502, the right hand 503, and the object 103 are not included in the visual field range 501.
- the detecting unit 211 detects a candidate area from the distance image 432 using the background distance image 431 and stores area information 433 indicating the detected candidate area in the storage unit 412.
- the candidate area is an area in which at least one of the left hand 502 or the right hand 503 of the worker 101 is estimated to be captured. For example, an area in contact with the end of the distance image 432 is detected as a candidate area.
- the line segment detection unit 421 obtains a curve by thinning the candidate area indicated by the area information 433, and obtains a plurality of connected line segments that approximate the curve as a plurality of line segments that approximate the candidate area. Then, the line segment detection unit 421 stores line segment information 434 indicating those line segments in the storage unit 412.
- the determination unit 422 obtains the angle of the line segment for each combination of two adjacent line segments included in the plurality of line segments indicated by the line segment information 434, and the line segment in the three-dimensional space corresponding to each line segment. Find the length of. Then, the determination unit 422 determines whether each line segment corresponds to the hand area using the obtained angle and length, and approximates the hand area from the plurality of line segments indicated by the line segment information 434. Extract line segments.
- the determination unit 422 selects another line segment whose distance to the end of the distance image 432 is farther than those line segments. Exclude the region from the approximate line segments.
- ⁇ min may be a threshold value representing the lower limit value of the bending angle of the wrist or elbow joint.
- the determination unit 422 is far from the end of the distance image 432 among those line segments. It is determined whether or not the other line segment is excluded from the line segment candidates that approximate the hand region. For this determination, the length of the line segment in the three-dimensional space corresponding to the line segment closer to the end of the distance image 432 is used.
- the determination unit 422 determines the line based on the distance from each of a plurality of points on the line segment that is the farthest to the end of the distance image 432 from the remaining line segments to the contour of the candidate region. Extract the part that approximates the hand area from the minutes. Then, the determination unit 422 stores hand region line segment information 435 indicating the extracted portion in the storage unit 412.
- the specifying unit 213 uses the contour of the candidate area indicated by the area information 433 and the line segment indicated by the hand area line segment information 435 to specify the position of the hand in the candidate area, and indicates the specified position.
- Information 436 is stored in the storage unit 412.
- the output unit 411 outputs a recognition result based on the position information 436.
- the output unit 411 may be a display unit that displays the recognition result on the screen, or may be a transmission unit that transmits the recognition result to another image processing apparatus.
- the recognition result may be a trajectory indicating a change in the three-dimensional position of the hand, or information indicating an operator's motion estimated from the hand trajectory.
- the object 103 is detected based on the angle of the line segment and the length of the line segment in the three-dimensional space from the line segments approximating the candidate area detected from the distance image 432. Corresponding line segments can be excluded. Then, by identifying the position of the hand within the remaining line segment including the line segment that is the farthest from the end of the distance image 432, even if the object 103 is an unknown object, the right hand The position of 503 can be specified with high accuracy. Therefore, the recognition accuracy of the position of the hand in proximity to the unknown object is improved, and the recognition accuracy of the operator's motion is also improved.
- FIG. 6 is a flowchart showing a specific example of image processing performed by the image processing apparatus 201 in FIG.
- the detection unit 211 performs candidate area detection processing (step 601), and then the extraction unit 212 and the specifying unit 213 perform hand position detection processing (step 602).
- FIG. 7 is a flowchart showing an example of candidate area detection processing in step 601 of FIG.
- the detection unit 211 subtracts the pixel value of each pixel of the background distance image 431 from the pixel value (distance value) of each pixel of the distance image 432, generates a difference image, and binarizes the generated difference image. (Step 701).
- FIG. 8 shows an example of the distance image 432 and the background distance image 431.
- FIG. 8A shows an example of the distance image 432
- FIG. 8B shows an example of the background distance image 431. Since the pixel values of the distance image 432 and the background distance image 431 represent the distance from the imaging device 401 to the subject, the pixel values are smaller as the subject is closer to the imaging device 401 and larger as the subject is farther from the imaging device 401. In this case, the difference between the background pixel values common to the distance image 432 and the background distance image 431 is close to 0, but the difference between the foreground pixel values closer to the imaging device 401 than the background is a negative value.
- the detection unit 211 compares the pixel value difference with T1 using a negative predetermined value as the threshold T1, and if the difference is less than T1, sets the pixel value of the difference image to 255 (white), If the difference is equal to or greater than T1, the pixel value of the difference image is set to 0 (black). Thereby, the difference image is binarized.
- FIG. 9 shows an example of a binarized difference image.
- FIG. 9A shows an example of a binary image generated in step 701.
- the difference image is binarized, not only the pixel value of both hands of the operator but also the pixel value of an object close to the hand is set to white.
- the detection unit 211 performs opening and closing for each pixel of the binary image (step 702). First, by performing the opening, white pixels are reduced and small white areas are removed. Thereafter, by performing closing, the black isolated point generated by the opening can be changed to white. Thereby, the binary image of FIG. 9B is generated from the binary image of FIG.
- the detection unit 211 selects each white pixel as a target pixel, and calculates a difference between the pixel value in the distance image 432 of the target pixel and the pixel value in the distance image 432 of white pixels adjacent to the target pixel in the upper, lower, left, and right directions. Ask. Then, the detection unit 211 compares the maximum absolute value of the difference with a predetermined threshold T2, and when the maximum value is equal to or greater than T2, changes the target pixel from a white pixel to a black pixel (step 703). Thereby, the binary image of FIG. 9C is generated from the binary image of FIG. 9B.
- the detection unit 211 performs contour detection processing to detect a plurality of white areas, and selects a white area that satisfies the following condition as a candidate area from the detected white areas (step 704).
- the white area is in contact with the end of the binary image.
- the area of the white region is not less than a predetermined value.
- the operator's arm extends from the lower end (body side) of the image toward the upper end, and thus a white region in contact with the lower end of the binary image is a candidate. Selected as a region.
- one white region shown in FIG. 9D is selected from the binary image shown in FIG.
- the detection unit 211 smoothes the outline of the selected white area (step 705).
- the outline of the white region may be uneven due to the influence of wrinkles on clothes worn by the worker. Therefore, by performing the processing in step 705, such unevenness can be smoothed. For example, closing may be used as the smoothing process.
- the smoothing process the binary image shown in FIG. 9E is generated from the binary image shown in FIG. 9D.
- the detection unit 211 detects the white region as the binary image.
- the image is divided into two white areas that contact the lower end at only one place (step 706).
- the detection unit 211 generates region information 433 indicating the white region generated by the division.
- the detection unit 211 obtains a contour part sandwiched between two contour parts in contact with the white region among the contours of the binary image, and is located at the top of the pixels of the obtained contour part.
- the x coordinate x1 of the pixel (the pixel with the smallest y coordinate) is obtained. Then, the detection unit 211 can divide the white area into two white areas by changing all white pixels whose x coordinate is x1 in the white area to black pixels.
- the binary image shown in FIG. 9F is generated from the binary image shown in FIG. 9E.
- the two white regions generated by the division correspond to a region including the left hand and a region including the right hand, respectively.
- the detection unit 211 compares the x coordinates of the contour portions included in the two white areas, determines the white area having the smaller x coordinate as a candidate area for the left hand, and selects the white area having the larger x coordinate as the white area. Determine the candidate area for the right hand. In this case, the detection unit 211 sets 2 to a variable nHands representing the number of detected candidate regions.
- step 704 when two white areas touching the lower end of the binary image at only one place are selected, the detection unit 211 sets nHands to 2 without dividing any white area. In addition, when one white region that touches the lower end of the binary image at only one location is selected, the detection unit 211 sets 1 to nHands.
- the detection unit 211 uses a positive predetermined value as the threshold T1, sets the pixel value of the difference image to 255 (white) when the difference is larger than T1, and sets the pixel value of the difference image to be equal to or less than T1.
- the pixel value is set to 0 (black).
- N is an integer of 3 or more
- the detection unit 211 displays the area.
- a maximum of N white regions may be selected in order from the larger.
- FIG. 10 is a flowchart showing an example of the hand position detection process in step 602 of FIG.
- the extraction unit 212 sets 0 to the control variable i indicating the i-th candidate area among the candidate areas indicated by the area information 433 (step 1001), and compares i with nHands (step 1002). When i is less than nHands (step 1002, YES), the line segment detection unit 421 performs a line segment detection process for the i-th candidate region (step 1003).
- the determination unit 422 performs parameter calculation processing (step 1004), performs bending determination processing (step 1005), and performs length determination processing (step 1006). Then, the specifying unit 213 performs position specifying processing (step 1007).
- the extraction unit 212 increments i by 1 (step 1008), and repeats the processing after step 1002.
- i reaches nHands (step 1002, NO)
- the extraction unit 212 ends the process.
- FIG. 11 is a flowchart showing an example of line segment detection processing in step 1003 of FIG.
- the line segment detection unit 421 thins the i-th candidate region (step 1101).
- the line segment detection unit 421 can thin the candidate region using a thinning algorithm such as the Tamura method, the Zhang-Suen method, or the NWG method described in Non-Patent Document 1.
- a branch occurs during thinning, the line segment detection unit 421 generates a single curve composed of an array of consecutive points, leaving only the longest branch of the plurality of branches.
- the line segment detector 421 approximates a curve with a plurality of connected line segments, and generates line segment information 434 indicating these line segments (step 1102).
- the line segment detection unit 421 uses the Ramer-Douglas-Peucker algorithm described in Non-Patent Document 2 to convert a curve into a plurality of approximate line segments, and allows a predetermined tolerance for the deviation between the curve and the approximate line segments. It can be within the range of error.
- FIG. 12 shows an example of line segment information 434 in an array format.
- the array number is identification information indicating one end point of both ends of each line segment in the binary image, and the x coordinate and the y coordinate are coordinate points (x, y) common to the binary image and the distance image 432. ).
- the point corresponding to sequence number 0 is located at the lower end of the candidate region, and the corresponding point moves away from the lower end as the sequence number increases.
- the distance value Z represents a pixel value corresponding to the coordinates (x, y) of the distance image 432.
- the j-th line segment and the j + 1-th line segment are adjacent to each other and are connected at a point corresponding to the array element number j + 1.
- the line segment detection unit 421 sets n + 1 to a variable nPts that represents the number of end points.
- FIG. 13 is a flowchart showing an example of parameter calculation processing in step 1004 of FIG.
- the determination unit 422 sets 0 to the control variable j representing the array element number of the line segment information 434 (step 1301), and compares j with nPts ⁇ 1 (step 1302). When j is less than nPts ⁇ 1 (step 1302, YES), the determination unit 422 determines the length len in the three-dimensional space of the j-th line segment having both ends corresponding to the array element number j and the array element number j + 1. j is calculated (step 1303).
- the determination unit 422 can obtain the length len j by a pinhole camera model using the coordinates (x, y) of two points corresponding to the array element number j and the array element number j + 1.
- the pinhole camera model is used, the length len1 of the line segment in the three-dimensional space corresponding to the line segment between the point (x1, y1) and the point (x2, y2) in the binary image is Calculated by the formula.
- X1 (Z1 ⁇ (x1 ⁇ cx)) / fx (1)
- Y1 (Z1 ⁇ (y1-cy)) / fy (2)
- X2 (Z2 ⁇ (x2-cx)) / fx (3)
- Y2 (Z2 ⁇ (y2-cy)) / fy (4)
- len1 ((X1-X2) 2 + (Y1-Y2) 2 + (Z1-Z2) 2 ) 1/2 (5)
- Z1 and Z2 represent the distance value Z of the point (x1, y1) and the point (x2, y2), respectively, and (cx, cy) represents the coordinates of the principal point in the binary image.
- the center of the binary image is used as the principal point.
- fx and fy are focal lengths expressed in units of pixels in the x-axis direction and the y-axis direction, respectively.
- the origin of the coordinate system representing the coordinates (X, Y, Z) in the three-dimensional space may be the installation position of the imaging device 401.
- the determination unit 422 compares j with nPts-2 (step 1304). When j is less than nPts ⁇ 2 (step 1304, YES), the determination unit 422 determines that the jth line segment and the j + 1th line segment having both ends corresponding to the array element number j + 1 and the array element number j + 2 are both ends. calculating the angle theta j between (step 1305).
- the determination unit 422 calculates the angle ⁇ by calculating the inner product described in Non-Patent Document 3 using the coordinates (x, y) of three points corresponding to the array element number j, the array element number j + 1, and the array element number j + 2. j can be obtained.
- the determination unit 422 increments j by 1 (step 1306), and repeats the processing after step 1302. If j reaches nPts ⁇ 2 (step 1304, NO), the determination unit 422 skips the process of step 1305 and repeats the processes after step 1306. Next, when j reaches nPts ⁇ 1 (step 1302, NO), the determination unit 422 ends the process.
- FIG. 14 is a flowchart illustrating an example of the bending determination process in step 1005 of FIG.
- a line segment estimated to exist before the hand is excluded based on the determination result for the angle ⁇ j of the line segment.
- the maximum number of folds between the lower end and the hand of the candidate area where the arm and hand are shown is two, and the third and subsequent folds are bent by fingers or bent by an object other than the hand. It is estimated that. Even when the number of bends is two or less, if the angle ⁇ j is smaller than the movable angle of the elbow or wrist, it is estimated that the bend is caused by a finger or an object.
- the determination unit 422 sets the control variable j to 0, sets the variable Nbend representing the number of bends to 0 (step 1401), and compares j with nPts-2 (step 1402). When j is less than nPts ⁇ 2 (step 1402, YES), the determination unit 422 compares ⁇ j and ⁇ min (step 1403).
- ⁇ min is non-patent document 4.
- ⁇ max is a threshold value for determining that there is no bending, and is set to a value larger than ⁇ min and smaller than 180 °.
- ⁇ max may be an angle in the range of 150 ° to 170 °.
- the determination unit 422 increments Nbend by 1 (step 1405) and compares Nbend with 2 (step 1406). When Nbend is 2 or less (step 1406, NO), the determination unit 422 increments j by 1 (step 1408), and repeats the processing after step 1402.
- the determination unit 422 deletes the points after the array element number j + 2 from the line segment information 434, and changes nPts from n + 1 to j + 2 (step 1407). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434.
- step 1403, NO the determination unit 422 performs the process of step 1407. Thereby, when ⁇ j is smaller than the movable angle of the elbow or wrist, the line segment corresponding to the finger or the object is deleted from the line segment information 434.
- the determination unit 422 determines that the point of the array element number j + 1 does not correspond to bending, skips the processing of step 1405 and step 1406, and after step 1408 Repeat the process.
- j reaches nPts ⁇ 2 (step 1402, NO)
- the determination unit 422 ends the process.
- the number of line segments indicated by the line segment information 434 can be reduced, and line segment candidates that approximate the hand area can be narrowed down.
- FIG. 15 shows an example of four line segments to be subjected to bending determination processing.
- p This represents the angle between line segments calculated from the three points ts [j + 1] and pts [j + 2].
- a line segment with both ends of pts [0] and pts [1] corresponds to the upper arm
- a line segment with both ends of pts [1] and pts [2] corresponds to the forearm
- pts [1] corresponds to the elbow.
- line segments having both ends of pts [2] and pts [3] correspond to the hand
- pts [2] corresponds to the wrist.
- a line segment having both ends of pts [3] and pts [4] corresponds to an object held by the hand
- pts [3] corresponds to a finger joint.
- FIG. 16 is a flowchart illustrating an example of the length determination process in step 1006 of FIG.
- a line segment that is presumed to exist before the hand is excluded based on the determination result with respect to the angle ⁇ j of the line segment and the length len j of the line segment.
- the determination unit 422 sets the control variable j to 0, sets the variable len representing the length in the three-dimensional space to 0, and sets the variable ctr representing the number of line segments to 0 (step 1601).
- J and nPts ⁇ 1 are compared (step 1602).
- the determination unit 422 adds len j to len (step 1603) and compares ⁇ j with ⁇ max (step 1604).
- ⁇ j is less than ⁇ max (step 1604, NO)
- the determination unit 422 compares len and len max ( Step 1605).
- len max is an upper limit value of the length of the forearm, and can be determined based on the length of the forearm described in Non-Patent Document 4, for example. Further, when the height of the worker is known, len max may be determined from the height based on the Vitruvian human figure of Leonardo da Vinci.
- the determination unit 422 deletes the points after the array element number j + 2 from the line segment information 434 and changes nPts to j + 2 (step 1606). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434. Then, the determination unit 422 performs normal line determination processing in steps 1613 to 1620.
- the determination unit 422 determines that the point of the array element number j + 1 does not correspond to bending, increments j by 1 (step 1611), and after step 1602 Repeat the process. If j reaches nPts ⁇ 1 (step 1602, NO), the determination unit 422 performs normal line determination processing in steps 1613 to 1620.
- the determination unit 422 determines that l en and len min are compared (step 1607). When len is equal to or greater than len min (step 1607, NO), the determination unit 422 sets len to 0 (step 1612), and performs the processing after step 1611.
- step 1607, YES When len is less than len min (step 1607, YES), the determination unit 422 ctr is incremented by 1 (step 1608), and ctr is compared with 2 (step 1609). When ctr is 2 or less (step 1609, NO), the determination unit 422 performs the processing after step 1612.
- the determination unit 422 deletes the points after the array element number j + 1 from the line segment information 434 and changes nPts to j + 1 (step 1610). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434. Then, the determination unit 422 performs normal line determination processing in steps 1613 to 1620.
- FIG. 17 shows an example of four line segments to be subjected to length determination processing.
- len 0 corresponds to the length of a part of the upper arm
- len 1 corresponds to the length of the forearm
- len 2 corresponds to the length of the hand
- len 3 corresponds to the length of the object held by the hand. To do.
- FIG. 18 shows an example of three line segments to be subjected to length determination processing.
- len 0 corresponds to the length of a part of the forearm
- len 1 corresponds to the length of the hand
- len 2 corresponds to the length of the object held by the hand.
- each of the plurality of points on the line segment having the furthest distance to the lower end of the candidate area is passed.
- Two intersection points of the straight line and the outline of the candidate area are obtained.
- a portion approximating the hand region is extracted from the line segment.
- the line segment with the longest distance to the lower end of the candidate area is a line segment having both ends of pts [nPts-2] and pts [nPts-1], and is estimated to be a line segment corresponding to the hand.
- the determination unit 422 divides a line segment L having both ends of pts [nPts-2] and pts [nPts-1] at an interval of a predetermined number of pixels, so that m pieces (m is 2 on the line segment L). (Integer) is obtained, and a normal line perpendicular to the line segment L is obtained at each point (step 1613). Then, the determination unit 422 calculates an intersection between each of the m normals and the contour of the candidate area. Since the line segment L exists in the candidate area and the contour of the candidate area exists on both sides of the line segment L, two intersections between each normal line and the contour are obtained.
- the determination unit 422 sets 0 to a control variable k indicating one normal line (step 1614), and compares k and m (step 1615).
- n_len k is an upper limit value of the hand width, and may be determined based on the hand width described in Non-Patent Document 4, for example, or may be determined based on a Vitruvian human figure.
- n_len k is equal to or smaller than n_len max (step 1617, NO)
- the determination unit 422 compares n_len k and n_len min (step 1618).
- n_len min Is a lower limit value of the hand width, and may be determined based on the hand width described in Non-Patent Document 4, for example, or may be determined based on a Vitruvian human figure.
- n_len k is greater than or equal to n_len min (step 1618, NO)
- the determination unit 422 increments k by 1 (step 1620), and repeats the processing after step 1615.
- k reaches m (step 1615, NO) the determination unit 422 ends the process.
- the determination unit 422 obtains an intersection between the k-th normal and the line segment L, and records the obtained intersection as pts [nPts ⁇ 1] ′. (Step 1619). As a result, a portion from the point pts [nPts-2] to the point pts [nPts-1] ′ on the line segment L is extracted as a portion approximating the hand region. Then, the determination unit 422 generates hand region line segment information 435 including the x coordinate, the y coordinate, and the distance value Z of pts [nPts-2] and pts [nPts-1] ′.
- N_len k is less than n_len min (step 1618, YES), the determination unit 422 performs the process of step 1619.
- FIG. 19 shows an example of a line segment to be subjected to normal line determination processing.
- a line segment corresponding to an object that has not been deleted by the bending determination process can be deleted, and a line segment approximating the hand region is specified. And the part which approximates a hand area on the specified line segment can be specified.
- FIG. 20 is a flowchart showing an example of the position specifying process in step 1007 of FIG.
- the center position of the palm is specified by obtaining the maximum inscribed circle inscribed in the outline of the candidate area in the hand area corresponding to the line segment indicated by the hand area line segment information 435.
- the inscribed circle at the center of the palm is considered to be larger than the inscribed circle in the closed finger area. Further, since the width of the palm is wider than the width of the wrist, the inscribed circle at the center of the palm is considered to be larger than the inscribed circle in the region of the wrist. Therefore, the center of the maximum inscribed circle in the hand region can be regarded as the center of the palm.
- the specifying unit 213 sets a scanning range between the points pts [nPts-2] and the points pts [nPts-1] ′, and is inscribed in the outline of the candidate area around each scanning point in the scanning range.
- a tangent circle is obtained (step 2001).
- the specifying unit 213 obtains the coordinates (xp, yp) of the center of the largest inscribed circle among the inscribed circles centered on each of the plurality of scanning points.
- the scanning range may be a line segment having points pts [nPts-2] and pts [nPts-1] ′ as both ends, and is an area in which the line segment is expanded by a predetermined number of pixels in the x direction and the y direction. There may be.
- the specifying unit 213 obtains the minimum value d of the distance from each scanning point to the contour of the candidate area, and obtains the coordinates (xmax, ymax) of the scanning point at which the minimum value d is maximum. Then, the specifying unit 213 records the coordinates (xmax, ymax) as the coordinates (xp, yp) of the center of the maximum inscribed circle. In this case, the minimum distance d at the scanning point (xmax, ymax) is the radius of the maximum inscribed circle.
- FIG. 21 shows an example of the maximum inscribed circle.
- a line segment 2101 having both ends of the points pts [nPts-2] and pts [nPts-1] ′ is set as the scanning range, and the maximum inscribed circle 2103 centering on the point 2102 on the line segment 2101 is set. Is required.
- the specifying unit 213 obtains a point in the three-dimensional space corresponding to the center of the maximum inscribed circle, and generates position information 436 indicating the position of the obtained point (step 2002). For example, the specifying unit 213 obtains the coordinates (Xp, Yp, Zp) of the point in the three-dimensional space using the pinhole camera model, using the coordinates (xp, yp) of the center of the maximum inscribed circle and the distance value Zp. be able to. In this case, Xp and Yp are calculated by the following equations.
- FIG. 22 shows a second specific example of the image processing apparatus 201 of FIG.
- the image processing apparatus 201 in FIG. 22 has a configuration in which a detection unit 2201 and an upper limb detection unit 2202 are added to the image processing apparatus 201 in FIG. 4, and obtains the position of the hand and the upper limb from the distance image 432.
- FIG. 23 shows an example of the visual field range of the imaging device 401 of FIG.
- the imaging apparatus 401 in FIG. 23 is installed above the workbench 102, and the visual field range 2301 is gripped by the workbench 102, the upper body 2302 including the left hand 502 and the right hand 503 of the worker 101, and the right hand 503.
- the object 2303 is included. Therefore, the work table 102, the upper body 2302, and the object 2303 are shown in the distance image 432 captured by the imaging device 401.
- the XW axis, YW axis, and ZW axis represent the world coordinate system, and the origin O is provided on the floor of the work place.
- the XW axis is provided in parallel with the long side of the work table 102 and represents the direction from the left shoulder to the right shoulder of the worker 101.
- the YW axis is provided in parallel with the short side of the work table 102 and represents the direction from the front to the back of the worker 101.
- the ZW axis is provided perpendicular to the floor surface and represents a direction from the floor surface toward the top of the worker 101.
- the detection unit 2201 detects a body region from the distance image 432 using the background distance image 431 and stores region information 2211 indicating the detected body region in the storage unit 412.
- the body region is a region where the upper body 2302 of the worker 101 is estimated to be captured.
- the upper limb detection unit 2202 identifies the position of the upper limb including the wrist, elbow, or shoulder using the distance image 432, the region information 2211, and the center position of the palm, and the position information 2212 indicating the identified position is stored in the storage unit 412. To store.
- the output unit 411 outputs a recognition result based on the position information 436 and the position information 2212.
- the recognition result may be a trajectory indicating a change in the three-dimensional position of the hand and the upper limb, or may be information indicating an operator's motion estimated from the trajectory of the hand and the upper limb.
- the image processing apparatus 201 in FIG. 22 it is possible to recognize the movement of the worker in consideration of not only the movement of the hand but also the movement of the upper limb, and the recognition accuracy of the movement of the worker is further improved.
- FIG. 24 is a flowchart illustrating a specific example of image processing performed by the image processing apparatus 201 in FIG.
- the distance image 432 is divided into two areas, a hand detection area and a body detection area.
- FIG. 25 shows an example of a distance image 432 and a background distance image 431 obtained by photographing the visual field range 2301 of FIG.
- FIG. 25A shows an example of the distance image 432
- FIG. 25B shows an example of the background distance image 431.
- the distance image 432 is divided into a hand detection area 2501 and a body detection area 2502, and the background distance image 431 is similarly divided into two areas.
- the detection unit 2201 performs body region detection processing using the body detection region of the distance image 432 (step 2401), and the detection unit 211 performs candidate region detection processing using the hand detection region of the distance image 432 (step 2401). Step 2402).
- the extraction unit 212 and the specifying unit 213 perform hand position detection processing (step 2403), and the upper limb detection unit 2202 performs upper limb position detection processing (step 2404).
- FIG. 26 is a flowchart showing an example of body region detection processing in step 2401 of FIG.
- the detection unit 2201 subtracts the pixel value of each pixel in the body detection area of the background distance image 431 from the pixel value of each pixel in the body detection area of the distance image 432, generates a difference image, and generates the generated difference.
- the image is binarized (step 2601).
- the detection unit 2201 performs opening and closing for each pixel of the binary image (step 2602). Then, the detection unit 2201 extracts, as a body region, a white region having the maximum area among the white regions included in the binary image (step 2603), and obtains the center of gravity of the body region (step 2604). For example, the detection unit 2201 can obtain the coordinates of the center of gravity of the body region by calculating the average value of the x and y coordinates of each of the plurality of pixels included in the body region.
- the detection unit 2201 obtains the position of the head shown in the body region (step 2505). For example, the detection unit 2201 generates a histogram of distance values for each of a plurality of pixels included in the body region, and a threshold value such that the number of pixels having a distance value equal to or less than the threshold value THD is greater than or equal to a predetermined number from the generated histogram. Determine the THD. Next, the detection unit 2201 selects the maximum region as a head region among regions composed of pixels having a distance value equal to or greater than the threshold value THD, and obtains the coordinates of the center of gravity of the head region as the head position. Then, the detection unit 2201 generates region information 2211 indicating the coordinates of the center of gravity of the body region and the head region.
- the detection unit 211 performs a candidate area detection process similar to that of FIG. 7 using the hand detection areas of the distance image 432 and the background distance image 431.
- FIG. 27 is a flowchart showing an example of the hand position detection process in step 2403 of FIG.
- the processing in Step 2701 to Step 2706, Step 2708, and Step 2709 in FIG. 27 is the same as the processing in Step 1001 to Step 1008 in FIG.
- the determination unit 422 performs a three-dimensional direction determination process after performing the length determination process (step 2707).
- FIG. 28 is a flowchart showing an example of the three-dimensional direction determination process in step 2707 of FIG. In the three-dimensional direction determination process, based on the result of comparing the direction in the three-dimensional space of each line segment and the direction in the three-dimensional space of the subject reflected in the area including the line segment, Presumed line segments are excluded.
- the determination unit 422 sets 0 to the control variable j (step 2801), and compares j with nPts ⁇ 1 (step 2802). When j is less than nPts ⁇ 1 (step 2802, YES), the determination unit 422 determines the coordinates (X j , Y j , Z j ) and coordinates (X j + 1 , Y j + 1 , Z j + 1 ) in the three-dimensional space of pts [j + 1] are obtained (step 2803). For example, the determination unit 422 can obtain (X j , Y j , Z j ) and (X j + 1 , Y j + 1 , Z j + 1 ) using a pinhole camera model.
- the determination unit 422 uses (X j , Y j , Z j ) and (X j + 1 , Y j + 1 , Z j + 1 ) to convert pts [j] and pts [j + 1] to both ends.
- a direction vector V j indicating the direction of the line segment L j in the three-dimensional space is obtained (step 2804).
- the determination unit 422 can determine a vector from (X j , Y j , Z j ) to (X j + 1 , Y j + 1 , Z j + 1 ) in the three-dimensional space as V j. .
- the determination unit 422 may obtain the coordinates in the three-dimensional space of each of a plurality of points on the line segment L j and obtain a vector that approximates a curve connecting these coordinates as V j .
- the determination unit 422 sets a peripheral region of the line segment L j in the candidate region, obtains coordinates in a three-dimensional space of each of a plurality of points in the peripheral region, and performs principal component analysis on these coordinates ( Step 2805). Then, the determination unit 422 obtains the first principal component vector EV j from the result of the principal component analysis (step 2806). EV j indicates the direction in the three-dimensional space of the subject shown in the area including the line segment L j .
- the line segment L 3 is pts. [3] and pts [4] are both line segments.
- the determination unit 422 obtains a normal line passing through pts [3] and a normal line passing through pts [4], and an area 2901 surrounded by the outlines of the two normal lines and the candidate area is obtained as a line segment L It can be set as 3 peripheral areas.
- the determination unit 422 obtains an angle ⁇ j between V j and EV j (step 2807), and compares ⁇ j with a threshold ⁇ th (step 2808). When ⁇ j is less than ⁇ th (step 28) (08, NO), the determination unit 422 increments j by 1 (step 2810), and repeats the processing after step 2802. If j reaches nPts ⁇ 1 (step 2802, NO), the determination unit 422 ends the process.
- the determination unit 422 deletes the points after the array element number j + 1 from the line segment information 434 and changes nPts to j + 1 (step 2809). As a result, the line segment corresponding to the finger or the object is deleted from the line segment information 434.
- FIG. 30 shows an example of the first principal component vectors EV 0 to EV 3 .
- EV 0 to EV 2 are obtained by principal component analysis on the peripheral regions of the line segments L 0 to L 2 , and correspond to the upper arm, forearm, and hand regions, respectively.
- the directions of EV 0 to EV 1 are close to the directions of the direction vectors V 0 to V 2 of L 0 to L 2 .
- the direction of EV 3 corresponding to the region of the object 3001 hand is holding is different significantly from the direction of the direction vector V 3 of the line segment L 3, the angle gamma 3 between V 3 and EV 3 It becomes larger than ⁇ th . Therefore, pts [4] is deleted from the line segment information 434, and nPts is changed from 5 to 4. Thereby, the line segment corresponding to the object 3001 is deleted.
- FIG. 31 is a flowchart showing an example of the upper limb position detection process in step 2404 of FIG.
- the upper limb detection unit 2202 obtains the position of the center of gravity of the head region indicated by the region information 2211 in the world coordinate system (step 3101).
- the upper limb detection unit 2202 can obtain coordinates in a three-dimensional space using a pinhole camera model using the coordinates of the center of gravity of the head region and the distance value.
- the upper limb detection unit 2202 converts the obtained coordinates into coordinates (XWH, YWH, ZWH) in the world coordinate system illustrated in FIG. 23 using an RT matrix based on the external parameters of the imaging device 401.
- the external parameters of the imaging device 401 include the height from the floor surface to the installation position of the imaging device 401 and the tilt angle of the imaging device 401, and the RT matrix includes the three-dimensional coordinate system and the world coordinates with the imaging device 401 as the origin. Represents rotation and translation between systems.
- the obtained ZWH represents the height from the floor surface to the top of the head, and corresponds to the approximate height of the operator.
- the upper limb detection unit 2202 obtains coordinates in the three-dimensional space of the center of gravity of the body area indicated by the area information 2211 and converts the obtained coordinates into coordinates (XWB, YWB, ZWB) in the world coordinate system using the RT matrix. (Step 3102).
- the upper limb detection unit 2202 obtains the approximate position of both shoulders in the world coordinate system using ZWH as the height (step 3103). For example, the upper limb detection unit 2202 determines the ratio of the height of both shoulders to the height and the ratio of the shoulder width to the height based on the Vitruvian human figure, and the height of the left shoulder (ZW coordinates) ZWLS, right The shoulder height ZWRS and the shoulder width SW can be obtained.
- the upper limb detection unit 2202 obtains XWB ⁇ SW / 2 as the lateral position (XW coordinate) XWLS of the left shoulder, obtains XWB + SW / 2 as the lateral position XWRS of the right shoulder, and determines YWB as the YW coordinates of the left shoulder and the right shoulder. To do.
- the upper limb detection unit 2202 may use XWH and YWH instead of XWB and YWB.
- the upper limb detection unit 2202 sets 0 to the control variable i indicating the i-th candidate region (step 3104), and compares i with nHands (step 3105). When i is less than nHands (step 3105, YES), the upper limb detection unit 2202 obtains the wrist position based on the palm center position obtained by the specifying unit 213 in the i-th candidate region (step 3106). . Then, the upper limb detection unit 2202 converts the coordinates of the wrist position into the coordinates of the world coordinate system using the RT matrix.
- the upper limb detection unit 2202 is present on the side closer to the lower end (body side) of the candidate region than the center position of the palm among pts [0] to pts [nPts-1] indicated by the line segment information 434, and The point closest to the center position can be determined as the wrist position.
- the upper limb detection unit 2202 may determine a point on the line segment that exists at a position away from the center position of the palm by a predetermined distance as the position of the wrist.
- FIG. 32 shows an example of the position of the wrist.
- pts [0] to pts [2] existing closer to the lower end of the candidate region than the palm center position 3201 pts [2] closest to the palm center position 3201 is the wrist position. To be determined.
- the upper limb detection unit 2202 obtains the elbow position based on the wrist position in the i-th candidate region, and converts the elbow position coordinates into coordinates in the world coordinate system using the RT matrix (step 3107). ).
- the upper limb detection unit 2202 determines the ratio of the forearm length to the height based on the Vitruvian human figure, obtains the forearm length from the ZWH, and calculates the length of the forearm on the image in the candidate region. Can be converted.
- the upper limb detection unit 2202 obtains a point separated from the wrist position by the length of the forearm on the image toward the side closer to the lower end of the candidate region, and among the pts [0] to pts [nPts ⁇ 1] Then, a point existing within a predetermined error range from the obtained point is determined as the elbow position. For example, in the example of FIG. 32, pts [1] is determined as the elbow position.
- the upper limb detection unit 2202 When the point indicated by the line segment information 434 does not exist within the predetermined error range, the upper limb detection unit 2202 includes the circle centered on the wrist position and the forearm length on the image as a radius, and the line segment information 434 An intersection with the line segment to be shown may be obtained, and the obtained intersection may be determined as the elbow position. When the elbow is not shown in the hand detection area of the distance image 432 and there is no intersection between the circle and the line segment, the upper limb detection unit 2202 is on the extension line of the line segment connecting the wrist position and pts [0]. A point separated from the wrist position by the length of the forearm on the image is determined as the elbow position.
- the upper limb detection unit 2202 corrects the position of the shoulder based on the position of the elbow in the world coordinate system (step 3108). For example, the upper limb detection unit 2202 can determine the ratio of the length of the upper arm to the height based on the Vitruvian human figure, and can determine the length of the upper arm from the ZWH. Then, the upper limb detection unit 2202 obtains a three-dimensional distance UALen between the elbow coordinates and the shoulder coordinates in the world coordinate system.
- the upper limb detection unit 2202 moves the position of the shoulder on the three-dimensional straight line connecting the elbow coordinates and the shoulder coordinates so that UAlen matches the length of the upper arm. .
- the upper limb detection unit 2202 does not correct the coordinates of the shoulder. Then, the upper limb detection unit 2202 generates position information 2212 indicating the positions of the wrist, elbow, and shoulder in the world coordinate system.
- the upper limb detection unit 2202 increments i by 1 (step 3109), and repeats the processing after step 3105.
- i reaches nHands (step 3105, NO)
- the upper limb detection unit 2202 ends the process.
- the upper limb detection unit 2202 may use a predetermined value set in advance as the height instead of ZWH.
- FIG. 33 shows a functional configuration example of an image processing system including the image processing apparatus 201 of FIG. 4 or FIG.
- the image processing system in FIG. 33 includes an image processing device 201 and an image processing device 3301.
- a transmission unit 3311 in the image processing apparatus 201 corresponds to the output unit 411 in FIG. 4 or FIG. 22, and transmits a recognition result based on the position information 436 or a recognition result based on the position information 436 and the position information 2212 via a communication network.
- the image processing apparatus 201 generates a plurality of recognition results in each of a plurality of time zones, and transmits the recognition results to the image processing apparatus 3301 in time series.
- the image processing device 3301 includes a reception unit 3321, a display unit 3322, and a storage unit 3323.
- the receiving unit 3321 receives a plurality of recognition results in time series from the image processing apparatus 201, and the storage unit 3323 stores the received plurality of recognition results in association with each of a plurality of time zones.
- the display unit 3322 displays the time-series recognition results stored in the storage unit 3323 on the screen.
- the image processing apparatus 201 may be a server installed at a work site such as a factory, or may be a server on the cloud that communicates with the imaging apparatus 401 via a communication network.
- the image processing device 3301 may be a server or a terminal device of an administrator who monitors the operation of the worker.
- the functions of the image processing apparatus 201 in FIG. 4 or 22 can be distributed and implemented in a plurality of apparatuses connected via a communication network.
- the detection unit 211, the extraction unit 212, the specifying unit 213, the detection unit 2201, and the upper limb detection unit 2202 may be provided in different devices.
- the configuration of the image processing apparatus 201 in FIGS. 2, 4, and 22 is merely an example, and some components may be omitted or changed according to the use or conditions of the image processing apparatus 201.
- the configuration of the image processing system in FIG. 33 is merely an example, and some components may be omitted or changed according to the use or conditions of the image processing system.
- Step 702 when the process is simplified, the processes of Step 702, Step 703, and Step 705 can be omitted.
- the detection unit 211 may select a white region that is in contact with the upper end, the left end, or the right end of the binary image as a candidate region.
- either the bending determination process in step 1005 or the length determination process in step 1006 may be omitted.
- the specifying unit 213 may determine the midpoint of the line segment indicated by the hand region line segment information 435 as the center position of the palm instead of the center of the maximum inscribed circle.
- the process of step 2602 can be omitted.
- the hand position detection process of FIG. 27 either the bending determination process in step 2705 or the length determination process in step 2706 may be omitted.
- the determination unit 422 may determine a vector indicating the direction in the three-dimensional space of the subject shown in the region including the line segment L j by a method other than the principal component analysis.
- the upper limb detection unit 2202 does not need to obtain all positions of the shoulder, wrist, and elbow.
- the upper limb detection unit 2202 may generate position information 2212 indicating the position of any one of a shoulder, a wrist, and an elbow.
- the installation position of the three-dimensional distance sensor in FIG. 1 and the installation position of the imaging device in FIGS. 5 and 23 are merely examples, and the three-dimensional distance sensor or the imaging device is installed at a position where the operator can be photographed from another angle. May be.
- the distance image and the background distance image change according to the subject existing in the visual field range of the imaging apparatus.
- the hand detection area and the body detection area in FIG. 25 are merely examples, and a hand detection area and a body detection area having different positions or different shapes may be used.
- the line segment information in FIG. 12 and the line segments in FIGS. 15, 17 to 19, and 32 are merely examples, and the line segment information and the line segments change according to the captured distance image.
- the maximum inscribed circle in FIG. 21, the peripheral region in FIG. 29, and the vector of the first principal component in FIG. 30 are only examples, and the vector of the maximum inscribed circle, the peripheral region, and the first principal component were taken. It changes according to the distance image.
- Other shaped peripheral regions may be used.
- FIG. 34 shows a configuration example of an information processing apparatus (computer) used as the image processing apparatus 201 in FIGS. 2, 4, and 22.
- 34 includes a central processing unit (CPU) 3401, a memory 3402, an input device 3403, an output device 3404, An auxiliary storage device 3405, a medium driving device 3406, and a network connection device 3407 are provided. These components are connected to each other by a bus 3408.
- the imaging device 401 may be connected to the bus 3408.
- the memory 3402 is a semiconductor memory such as a Read Only Memory (ROM), a Random Access Memory (RAM), and a flash memory, and stores programs and data used for processing.
- ROM Read Only Memory
- RAM Random Access Memory
- flash memory stores programs and data used for processing.
- the memory 3402 can be used as the storage unit 412.
- the CPU 3401 executes a program using the memory 3402 to detect the detection unit 211, the extraction unit 212, the specifying unit 213, the line segment detection unit 421, the determination unit 422, the detection unit 2201, and the upper limb detection.
- the unit 2202 operates.
- the input device 3403 is, for example, a keyboard, a pointing device, etc., and is used for inputting an instruction or information from an operator or a user.
- the output device 3404 is, for example, a display device, a printer, a speaker, or the like, and is used to output an inquiry or processing result to the operator or the user.
- the processing result may be a recognition result based on the position information 436 or a recognition result based on the position information 436 and the position information 2212.
- the output device 3404 can be used as the output unit 411.
- the auxiliary storage device 3405 is, for example, a magnetic disk device, an optical disk device, a magneto-optical disk device, a tape device, or the like.
- the auxiliary storage device 3405 may be a flash memory or a hard disk drive.
- the information processing apparatus can store a program and data in the auxiliary storage device 3405 and load them into the memory 3402 for use.
- the auxiliary storage device 3405 can be used as the storage unit 412.
- the medium driving device 3406 drives a portable recording medium 3409 and accesses the recorded contents.
- the portable recording medium 3409 is a memory device, a flexible disk, an optical disk, a magneto-optical disk, or the like.
- the portable recording medium 3409 includes Compact Disk Read Only Memory (CD-ROM), Digital Versatile Disk (DVD), Universal Serial Bus (US). B) It may be a memory or the like. An operator or user can store programs and data in the portable recording medium 3409 and load them into the memory 3402 for use.
- the computer-readable recording medium for storing the program and data used for the processing is the memory 3402, the auxiliary storage device 3405, or the portable recording medium 340. 9 is a physical (non-temporary) recording medium.
- the network connection device 3407 is a communication interface that is connected to a communication network such as a local area network or a wide area network, and performs data conversion accompanying communication.
- the information processing apparatus can receive a program and data from an external apparatus via the network connection apparatus 3407, and can use them by loading them into the memory 3402.
- the network connection device 3407 can be used as the output unit 411 or the transmission unit 3311.
- the information processing apparatus does not have to include all the components shown in FIG. 34, and some components may be omitted depending on the application or conditions. For example, if it is not necessary to input an instruction or information from an operator or user, the input device 3403 may be omitted. When the portable recording medium 3409 or the communication network is not used, the medium driving device 3406 or the network connection device 3407 may be omitted.
- the information processing apparatus in FIG. 34 can also be used as the image processing apparatus 3301 in FIG.
- the network connection device 3407 is used as the reception unit 3321
- the memory 3402 or the auxiliary storage device 3405 is used as the storage unit 3323
- the output device 3404 is used as the display unit 3322.
- a detection unit that detects a candidate area in which a hand is captured from a distance image;
- An extraction unit that extracts a line segment that approximates a hand region from the plurality of line segments based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region; Using a line segment that approximates the hand region, a specifying unit that specifies the position of the hand in the candidate region;
- An image processing apparatus comprising: (Appendix 2) The detection unit detects a region in contact with the end of the distance image as the candidate region, and the extraction unit determines that the distance to the end is the two lines when the angle is smaller than a first threshold.
- a line segment that is farther than the minute is excluded from line segment candidates that approximate the hand region, and a line segment that is the farthest to the end of the remaining line segments is a line segment that approximates the hand region.
- An angle range between the second line segment and the third line segment is included in the angle range, and an angle between the third line segment and the fourth line segment is the angle range.
- the image processing apparatus according to appendix 2, wherein the fourth line segment is excluded from line segment candidates that approximate the hand region. (Appendix 4)
- the extraction unit corresponds to a line segment having a shorter distance to the end portion among the two line segments. Using the length of the line segment, it is determined whether to exclude the line segment that is farther from the end of the two line segments from the line segment candidates that approximate the hand region.
- the image processing apparatus wherein: (Appendix 5) The extraction unit obtains two intersections between a straight line passing through each of a plurality of points on a line segment having the longest distance to the end and the contour of the candidate area, and corresponds to the distance between the two intersections. Then, based on the distance in the three-dimensional space, the part that approximates the hand region is extracted from the line segment that is the farthest to the end, and the specifying unit uses the part that approximates the hand region.
- the image processing apparatus according to appendix 3 or 4, wherein the position of the hand is specified.
- the extraction unit is configured to detect the three-dimensional object captured in a region including a direction in the three-dimensional space of a line segment that is farther to the end and a line segment that is farther to the end. If the angle between the two directions is greater than the third threshold value, the line segment that is farther to the end is excluded from the line segment candidates that approximate the hand region.
- the image processing apparatus according to any one of appendices 3 to 5, wherein (Appendix 7)
- the extraction unit obtains a curve by thinning the candidate area, and obtains a plurality of line segments approximating the curve as the plurality of line segments approximating the candidate area.
- the image processing apparatus according to any one of claims 6 to 6.
- Appendix 8 The image processing apparatus according to any one of appendices 1 to 7, further comprising an upper limb detection unit that obtains a position of a wrist, an elbow, or a shoulder using the distance image and the position of the hand.
- Appendix 9 9. The image processing device according to any one of appendices 2 to 8, wherein the first threshold value represents a lower limit value of a bending angle of a wrist or elbow joint.
- a detection unit that detects a candidate area in which a hand is captured from a distance image;
- An extraction unit that extracts a line segment that approximates a hand region from the plurality of line segments based on an angle between two adjacent line segments among the plurality of line segments that approximate the candidate region;
- a specifying unit that specifies the position of the hand in the candidate region;
- a display unit for displaying position information indicating the position of the hand;
- An image processing system comprising: (Appendix 11) From the distance image, detect the candidate area where the hand is reflected, Based on the angle between two adjacent line segments out of a plurality of line segments approximating the candidate area, a line segment approximating a hand area is extracted from the plurality of line segments, Identifying the position of the hand in the candidate region using a line segment approximating the hand region;
- An image processing program for causing a computer to execute processing.
- the computer detects a region that is in contact with an end of the distance image as the candidate region, and when the angle is smaller than a first threshold, a line segment whose distance to the end is farther than the two line segments. Is extracted from the line segment candidates that approximate the hand region, and the line segment that is the farthest to the end of the remaining line segments is extracted as a line segment that approximates the hand region.
- the image processing program according to appendix 11.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
Abstract
Le problème décrit par la présente invention est d'identifier avec une bonne précision la position d'une main saisissant un objet à partir d'une image de distance de la main. La solution selon l'invention porte sur une unité de détection 211 qui détecte à partir d'une image de distance une région candidate dans laquelle une main est capturée. Sur la base d'angles entre deux segments de ligne adjacents parmi une pluralité de segments de ligne qui se rapprochent de la région candidate, une unité d'extraction 212 extrait des segments de ligne qui se rapprochent d'une région de main parmi la pluralité de segments de ligne qui se rapprochent de la région candidate. A l'aide des segments de ligne qui se rapprochent de la région de main, une unité d'identification 213 identifie la position de la main dans la région candidate.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2017-005872 | 2017-01-17 | ||
JP2017005872A JP2018116397A (ja) | 2017-01-17 | 2017-01-17 | 画像処理装置、画像処理システム、画像処理プログラム、及び画像処理方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018135326A1 true WO2018135326A1 (fr) | 2018-07-26 |
Family
ID=62907902
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2018/000120 WO2018135326A1 (fr) | 2017-01-17 | 2018-01-05 | Dispositif de traitement d'image, système de traitement d'image, programme de traitement d'image et procédé de traitement d'image |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP2018116397A (fr) |
WO (1) | WO2018135326A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111222379A (zh) * | 2018-11-27 | 2020-06-02 | 株式会社日立制作所 | 一种手部检测方法及装置 |
CN114158281A (zh) * | 2020-07-07 | 2022-03-08 | 乐天集团股份有限公司 | 区域提取装置、区域提取方法和区域提取程序 |
CN118736626A (zh) * | 2024-09-04 | 2024-10-01 | 宁波星巡智能科技有限公司 | 基于手持作业四角检测的学习陪护方法、装置及设备 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7375456B2 (ja) | 2019-10-18 | 2023-11-08 | 株式会社アイシン | 爪先位置推定装置および指先位置推定装置 |
EP4064213A1 (fr) * | 2021-03-25 | 2022-09-28 | Grazper Technologies ApS | Véhicule utilitaire et appareil correspondant, procédé et programme informatique pour un véhicule utilitaire |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009294843A (ja) * | 2008-06-04 | 2009-12-17 | Tokai Rika Co Ltd | 操作者判別装置及び操作者判別方法 |
JP2012123667A (ja) * | 2010-12-09 | 2012-06-28 | Panasonic Corp | 姿勢推定装置および姿勢推定方法 |
JP2012133665A (ja) * | 2010-12-22 | 2012-07-12 | Sogo Keibi Hosho Co Ltd | 把持物体認識装置、把持物体認識方法、及び把持物体認識プログラム |
-
2017
- 2017-01-17 JP JP2017005872A patent/JP2018116397A/ja active Pending
-
2018
- 2018-01-05 WO PCT/JP2018/000120 patent/WO2018135326A1/fr active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009294843A (ja) * | 2008-06-04 | 2009-12-17 | Tokai Rika Co Ltd | 操作者判別装置及び操作者判別方法 |
JP2012123667A (ja) * | 2010-12-09 | 2012-06-28 | Panasonic Corp | 姿勢推定装置および姿勢推定方法 |
JP2012133665A (ja) * | 2010-12-22 | 2012-07-12 | Sogo Keibi Hosho Co Ltd | 把持物体認識装置、把持物体認識方法、及び把持物体認識プログラム |
Non-Patent Citations (2)
Title |
---|
ITOH, MASATSUGU ET AL.: "Hand and object tracking system for presentation scenes", PROCEEDINGS OF THE INFORMATION AND SYSTEMS SOCIETY CONFERENCE OF IEICE 2001, 29 August 2001 (2001-08-29) * |
IWATA, TATSUAKI ET AL.: "3-D information input system based on hand motion recognition by image sequence processing", TECHNICAL REPORT OF IEICE, vol. 100, no. 634, 16 February 2001 (2001-02-16), pages 29 - 36 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111222379A (zh) * | 2018-11-27 | 2020-06-02 | 株式会社日立制作所 | 一种手部检测方法及装置 |
CN114158281A (zh) * | 2020-07-07 | 2022-03-08 | 乐天集团股份有限公司 | 区域提取装置、区域提取方法和区域提取程序 |
CN118736626A (zh) * | 2024-09-04 | 2024-10-01 | 宁波星巡智能科技有限公司 | 基于手持作业四角检测的学习陪护方法、装置及设备 |
Also Published As
Publication number | Publication date |
---|---|
JP2018116397A (ja) | 2018-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018135326A1 (fr) | Dispositif de traitement d'image, système de traitement d'image, programme de traitement d'image et procédé de traitement d'image | |
CN111844019B (zh) | 一种机器抓取位置确定方法、设备、电子设备和存储介质 | |
US11847803B2 (en) | Hand trajectory recognition method for following robot based on hand velocity and trajectory distribution | |
JP4625074B2 (ja) | サインに基づく人間−機械相互作用 | |
JP4934220B2 (ja) | ラベル割当を用いた手サイン認識 | |
CN104123529B (zh) | 人手检测方法及系统 | |
CN109955244B (zh) | 一种基于视觉伺服的抓取控制方法、装置和机器人 | |
JPWO2009147904A1 (ja) | 手指形状推定装置、手指形状の推定方法及びプログラム | |
US11436754B2 (en) | Position posture identification device, position posture identification method and position posture identification program | |
Papanikolopoulos | Selection of features and evaluation of visual measurements during robotic visual servoing tasks | |
KR20150127381A (ko) | 얼굴 특징점 추출 방법 및 이를 수행하는 장치 | |
CN111598172A (zh) | 基于异构深度网络融合的动态目标抓取姿态快速检测方法 | |
CN114495273A (zh) | 一种机器人手势遥操作方法及相关装置 | |
Kerdvibulvech | Hand tracking by extending distance transform and hand model in real-time | |
JP2021000694A (ja) | ロボット教示装置及びロボットシステム | |
JP2009216503A (ja) | 三次元位置姿勢計測方法および装置 | |
Obukhov et al. | Organization of three-dimensional gesture control based on machine vision and learning technologies | |
CN119311122A (zh) | 基于自适应多模态融合的云游戏虚拟现实手势交互方法、设备及存储介质 | |
JP5083715B2 (ja) | 三次元位置姿勢計測方法および装置 | |
CN109871116B (zh) | 用于识别手势的装置和方法 | |
KR102382883B1 (ko) | 3차원 손 자세 인식 장치 및 방법 | |
JP2018088209A (ja) | 画像処理装置、画像処理プログラム、及び画像処理方法 | |
CN109934155B (zh) | 一种基于深度视觉的协作机器人手势识别方法及装置 | |
CN109871857B (zh) | 用于识别手势的方法和装置 | |
JP6393495B2 (ja) | 画像処理装置および物体認識方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18741358 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18741358 Country of ref document: EP Kind code of ref document: A1 |