Disclosure of Invention
The embodiment of the invention aims to provide an object positioning method, an object positioning device and a storage medium, which reduce the calculation amount for determining the three-dimensional coordinates of an object to be positioned and improve the speed of image processing; and the influence of the calibration parameters of the camera device is small, and the error is small.
In order to solve the above technical problem, an embodiment of the present invention provides an object positioning method, in which a camera device is used to obtain a left view and a right view of an object to be positioned at 2N positions respectively under a current background, where N is greater than or equal to 1; determining N parallax images of an object to be positioned according to the left view and the right view at the 2N positions; and calculating the three-dimensional coordinates of the object to be positioned according to the N parallax images of the object to be positioned.
An embodiment of the present invention also provides an object positioning device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the object positioning method.
Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the above object positioning method.
Compared with the prior art, the embodiment of the invention provides an object positioning method, which comprises the steps of acquiring a left view and a right view of an object to be positioned at 2N positions respectively under a current background by utilizing a camera device, wherein N is more than or equal to 1; determining N parallax images of the object to be positioned according to the left view and the right view at the 2N positions; the three-dimensional coordinate of the object to be positioned is calculated according to the N parallax images of the object to be positioned, the three-dimensional coordinate of the object to be positioned is calculated through the N parallax images of the object to be positioned, only the image of the object to be positioned is concerned, and the background image is not required to be concerned, so that the calculation amount for determining the three-dimensional coordinate of the object to be positioned is reduced, and the speed of image processing is improved; meanwhile, the whole image is not required to be subjected to image correction by using the calibration parameters of the camera device, so that the influence of the calibration parameters is small, and the error is small.
In addition, the 2N positions include at least: a first position and a second position; the left view includes: a first left view at a first location and a second left view at a second location; the right view includes: a first right view at a first location and a second right view at a second location; the method comprises the following steps of acquiring a left view and a right view of an object to be positioned at 2N positions respectively under a current background by using a camera device, and specifically comprises the following steps: the method comprises the steps of acquiring a first left view and a first right view of an object to be positioned at a first position under a current background by using a camera device, and acquiring a second left view and a second right view of the object to be positioned at a second position under the current background.
In addition, according to the left view and the right view at 2N positions, N parallax images of the object to be positioned are determined, specifically including: calculating a left differential image of the object to be positioned according to the first left view and the second left view; calculating a right differential image of the object to be positioned according to the first right view and the second right view; and calculating the parallax image of the object to be determined according to the left differential image and the right differential image. According to the scheme, the left differential image and the right differential image obtained by the difference of the two images eliminate background noise, and the interference of the background on the identification of the object to be positioned in the image is reduced.
In addition, the three-dimensional coordinates of the object to be positioned are calculated according to the N parallax images of the object to be positioned, and the method specifically comprises the following steps: acquiring calibration parameters of a camera device; and calculating the three-dimensional coordinates of the object to be positioned in the N parallax images according to the calibration parameters.
In addition, the method for calculating the three-dimensional coordinates of the object to be positioned in the N parallax images according to the calibration parameters specifically comprises the following steps: calculating the sub three-dimensional coordinates of the object to be positioned in each parallax image according to the calibration parameters to obtain N groups of sub three-dimensional coordinates; and fitting and solving the three-dimensional coordinates of the object to be positioned according to the N groups of sub three-dimensional coordinates. According to the scheme, the three-dimensional coordinates of the object to be positioned are obtained through fitting of N groups of sub three-dimensional coordinates, so that the determined three-dimensional coordinates of the object to be positioned are more accurate.
In addition, before calculating the three-dimensional coordinates of the object to be positioned according to the N parallax images of the object to be positioned, the method further includes: acquiring an image of a component moving an object to be positioned; and eliminating the image of the part in the parallax image. According to the scheme, the images of the components in the parallax images are removed, so that objects concerned in the images are further reduced, and the image processing speed is further improved.
In addition, before the image capturing device is used for acquiring the left view and the right view of the object to be positioned at 2N positions respectively under the current background, the method further comprises the following steps: acquiring at least two images of a current background by using a camera device; determining whether the current background is a fixed background or not according to at least two images of the current background; and if the current background is determined to be the fixed background, then acquiring a left view and a right view of the object to be positioned at 2N positions respectively under the current background by using the camera device. According to the scheme, when the current background is determined to be the fixed background, the image of the object to be positioned under the fixed background is obtained, and the problem that the identification of the object to be positioned is influenced due to the change of the current background is avoided.
In addition, determining whether the current background is a fixed background according to at least two images of the current background specifically includes: calculating the change amplitude of the current background according to the at least two images; judging whether the variation amplitude is smaller than a preset threshold value or not; and if the change amplitude is smaller than the preset threshold value, determining that the current background is a fixed background.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
At present, a method for positioning an object is generally a robot binocular recognition method, and comprises the steps of firstly, carrying out three-dimensional calibration on a binocular camera to obtain internal and external calibration parameters of the camera; then, shooting a left picture and a right picture of the object to be positioned by using a binocular camera, and correcting the left image and the right image by using calibration parameters; and finally, calculating the disparity maps of the left image and the right image by a stereo matching algorithm, calculating the depth and the distance of an object to be positioned in the shot scene by combining the calibration parameters, and giving a depth map and a point cloud map.
The inventor finds that the following problems exist in the prior art: the stereo matching algorithm is complex, the calculated amount is large, the processing speed of the image is limited, and the higher the resolution of the image is, the slower the processing speed of the image is; the image correction and parallax calculation are sensitive to the calibration parameters of the camera, the small change of the parameters greatly affects the calculation, and the calibration parameters of the camera have certain errors, so that the accuracy is difficult to achieve. Especially for the identification of small-scale objects, because the focus object is far away from the background and the background is kept constant during the operation, there are high requirements on the details and precision of the focus object, such as the robot holds a small object: some fine operations such as cups, pens and the like are often required to be quickly identified and positioned, and focus objects are subjected to fine operations, the stereo matching difficulty is high, corresponding matching points cannot be easily found by pixels, noise is formed, and objects with smaller sizes are easily submerged in the noise.
In view of the above, a first embodiment of the present invention relates to an object positioning method, and the core of the present embodiment lies in providing an object positioning method, in which a left view and a right view of an object to be positioned at 2N positions respectively in a current background are acquired by using an image capture device, where N is greater than or equal to 1; determining N parallax images of the object to be positioned according to the left view and the right view at the 2N positions; the three-dimensional coordinate of the object to be positioned is calculated according to the N parallax images of the object to be positioned, the three-dimensional coordinate of the object to be positioned is calculated through the N parallax images of the object to be positioned, only the image of the object to be positioned is concerned, and the background image is not required to be concerned, so that the calculation amount for determining the three-dimensional coordinate of the object to be positioned is reduced, and the speed of image processing is improved; meanwhile, the whole image is not required to be subjected to image correction by using the calibration parameters of the camera device, so that the influence of the calibration parameters is small, and the error is small.
The following describes the implementation details of the object positioning method of the present embodiment in detail, and the following is only provided for the convenience of understanding and is not necessary for implementing the present embodiment.
Fig. 1 is a schematic flow chart of the object positioning method in the present embodiment:
step 101: and acquiring a left view and a right view of the object to be positioned at 2N positions respectively under the current background by using the camera device.
Specifically, a left view and a right view of an object to be positioned at 2N positions respectively in a current background are acquired by an image pickup device, where N is greater than or equal to 1. The camera device can be a camera and can also be a robot vision system. In the present embodiment, an example in which the imaging device is a binocular vision system of a robot is described. When the image is obtained, the robot can be used for holding the object to be positioned, the number of places covered by the robot hand is reduced as much as possible when the robot is held, so that the object to be positioned can be highlighted in the image, the posture of the robot for holding the object to be positioned is determined, and the purpose is to show the object to be positioned in the shot image as much as possible, for example: the hand-held screw enables the normal direction of the screw hole to face the robot; the hand-held needle enables the normal direction of the needle hole to face the robot, and the occupation ratio of the robot arm in the image is reduced as much as possible. In addition, the binocular vision system focuses on the object to be positioned, and the object to be positioned is guaranteed to be clear and visible.
If N is 1, placing the object to be positioned at the first position, and shooting a first left view and a first right view (IMG _ L1, IMG _ R1) of the object to be positioned at the first position through a binocular vision system of the robot; thereafter, the pose of the robot head and the holding of the object to be positioned is maintained, the object to be positioned is translated a distance in a direction parallel to the binocular baseline of the robot to a second position, where a second left and a second right view (IMG _ L2, IMG _ R2) are taken, it being noted that the second position should be within the visual range visible by the binocular vision system of the robot.
Further, before the image capturing device is used to obtain the left view and the right view of the object to be positioned at 2N positions under the current background, the method further includes: acquiring at least two images of a current background by using a camera device; determining whether the current background is a fixed background or not according to at least two images of the current background; and if the current background is determined to be the fixed background, then acquiring a left view and a right view of the object to be positioned at 2N positions respectively under the current background by using the camera device.
In the embodiment, before the image of the object to be positioned is acquired by using the binocular vision system of the robot head, whether the current background is a fixed background is determined, and when the current background is determined to be the fixed background, the image of the object to be positioned under the fixed background is acquired, so that the problem that the identification of the object to be positioned is influenced by the change of the current background is avoided. The fixed background is in a static state, and people or objects without movement, such as a wall, can be used as the background.
Specifically, determining whether the current background is a fixed background according to at least two images of the current background specifically includes: calculating the change amplitude of the current background according to the at least two images; judging whether the variation amplitude is smaller than a preset threshold value or not; and if the change amplitude is smaller than the preset threshold value, determining that the current background is a fixed background. The image change in front of the robot can be detected through the robot binocular vision system, when the image change amplitude is smaller than a preset threshold value, the background is considered to be fixed and stable, and the next step of work can be started. The preset threshold value can be set by a worker according to actual conditions.
Step 102: and determining N parallax images of the object to be positioned according to the left view and the right view at the 2N positions.
In this embodiment, the 2N positions include at least: a first position and a second position; the left view includes: a first left view at a first location and a second left view at a second location; the right view includes: a first right view at a first location and a second right view at a second location; the method comprises the following steps of acquiring a left view and a right view of an object to be positioned at 2N positions respectively under a current background by using a camera device, and specifically comprises the following steps: a first left view IMG _ L1 and a first right view IMG _ R1 are acquired with the camera with the object to be positioned at a first position against the current background, and a second left view IMG _ L2 and a second right view IMG _ R2 are acquired with the object to be positioned at a second position against the current background.
Determining N parallax images of an object to be positioned according to the left view and the right view at the 2N positions, specifically comprising: calculating a left differential image of the object to be positioned according to the first left view and the second left view; calculating a right differential image of the object to be positioned according to the first right view and the second right view; and calculating the parallax image of the object to be determined according to the left differential image and the right differential image.
Specifically, after the image capturing at the first position and the second position is completed, the left difference image (IMG _ L _ DIFF) and the right difference image (IMG _ R _ DIFF) are obtained by the image difference processing of the left view and the right view, respectively, that is: IMG _ L _ DIFF — IMG _ L2-IMG _ L1; IMG _ R _ DIFF — IMG _ R2-IMG _ R1. According to the scheme, the left difference image (IMG _ L _ DIFF) and the right difference image (IMG _ R _ DIFF) obtained by the difference mode of the two images at different positions eliminate background noise, and reduce the interference of the background on the identification of an object to be positioned in the images. After obtaining the left difference image (IMG _ L _ DIFF) and the right difference image (IMG _ R _ DIFF), a parallax image (IMG _ DISPARITY) of the object to be positioned is calculated, namely: IMG _ DISPARITY ═ IMG _ R _ DIFF-IMG _ L _ DIFF.
Step 103: and calculating the three-dimensional coordinates of the object to be positioned according to the N parallax images of the object to be positioned.
Specifically, the method for calculating the three-dimensional coordinates of the object to be positioned according to the N parallax images of the object to be positioned specifically includes: acquiring calibration parameters of a camera device; and calculating the three-dimensional coordinates of the object to be positioned in the N parallax images according to the calibration parameters. According to the scheme, the three-dimensional coordinates of an object to be positioned in the N parallax images are calculated by combining calibration parameters of a robot binocular vision system, and after the three-dimensional coordinates of the object to be positioned are obtained, the robot can perform the next operation, such as screwing, threading and the like, according to the three-dimensional coordinates of the object to be positioned (such as a screw and a needle). The rapid identification and the accurate positioning of the positioning object can provide effective visual parameters for the next operation of the robot, so that the robot can better and more accurately see the positioning object to finish the fine operation (such as threading a needle, screwing a screw and the like). Especially for smaller sized articles such as screwdrivers, pens, spoons, even needles, etc., this solution is more applicable and the smaller the more easily identifiable and locatable.
Further, calculating the three-dimensional coordinates of the object to be positioned in the N parallax images according to the calibration parameters, specifically comprising: calculating the sub three-dimensional coordinates of the object to be positioned in each parallax image according to the calibration parameters to obtain N groups of sub three-dimensional coordinates; and fitting and solving the three-dimensional coordinates of the object to be positioned according to the N groups of sub three-dimensional coordinates.
Specifically, for N parallax images obtained from a left view and a right view at 2N positions, the sub three-dimensional coordinates of the object to be positioned in each parallax image are respectively calculated according to the calibration parameters, N groups of sub three-dimensional coordinates can be obtained, and the three-dimensional coordinates of the object to be positioned are obtained through fitting of the N groups of sub three-dimensional coordinates, so that the determined three-dimensional coordinates of the object to be positioned are more accurate.
Further, before calculating the three-dimensional coordinates of the object to be positioned according to the N parallax images of the object to be positioned, the method further includes: acquiring an image of a component moving an object to be positioned; and eliminating the image of the part in the parallax image. In this scheme, we only focus on the part of the object to be positioned held by the robot in the parallax image, and therefore filter out the image of the part moving the object to be positioned, for example: a robot arm portion for holding an object to be positioned. In the present embodiment, an image of a component that moves an object to be positioned may be stored in advance, so that an image of the component in each parallax image can be eliminated after N parallax images are obtained. By eliminating images of components in the parallax image, objects of interest in the images are further reduced, thereby further improving the image processing speed.
Fig. 2 to 5 show schematic diagrams of experiments performed in this embodiment, in which a toy is used to represent a robot arm, and a toothpick inserted into the toy, i.e., an object to be positioned held in the robot arm, is located at a first position, and a first left view and a first right view are obtained as shown in fig. 2(a) and fig. 2(b), respectively; when the toothpick is in the second position, a second left view is shown in fig. 3(a) and a second right view is shown in fig. 3 (b); by moving the toothpick from the first position to the second position, a left difference image (as shown in fig. 4 (a)) and a right difference image (as shown in fig. 4 (b)) are respectively obtained, then the difference is carried out to obtain a parallax image (as shown in fig. 5), the parallax of the toothpick is 130 pixels through image recognition, the parallax image is the same as the data of the corresponding position of the parallax image (as shown in fig. 6) calculated through the SGBM stereo matching algorithm and is also about 130 pixels, but the calculation time of the stereo matching algorithm reaches 12s due to the large image resolution (4096x3040), and the calculation time is far less than 12s by using the method in the embodiment; in addition, the vicinity of the object (toothpick) to be positioned in the parallax image obtained by the object positioning method in the embodiment is obviously rarely affected by noise, and the vicinity of the object (toothpick) to be positioned in the stereo matching algorithm is obviously noisy.
The differences of the scheme in this embodiment compared to the existing scheme are summarized as follows:
1. the scheme utilizes dynamic images generated by the motion of an object to be positioned, and the existing scheme only considers static images.
2. While this scheme considers multiple images (left and right views, so the total number of images is 4N) at multiple positions (2N), the prior scheme only processes two images at the same position.
3. According to the scheme, the interference of the background is removed through the image difference of the object to be positioned at different positions, so that the noise is greatly reduced; in the scheme, the background is filtered through image difference, so that an object to be positioned is highlighted, and the texture of the background is not required; the matching algorithm of the existing scheme requires that the image has stronger texture, and objects with poor texture (such as white paper, mirror surfaces and the like) are difficult to match.
4. According to the scheme, only the differential image is processed to calculate the three-dimensional coordinate of the object to be positioned, and the three-dimensional coordinate is focused on the concerned object to be positioned, so that the calculated amount is obviously reduced; the existing scheme processes the whole image and does not focus on an object to be positioned. The smaller the size of the object to be positioned, the easier the identification is; in contrast to prior art solutions, smaller objects are more difficult to match for recognition.
5. According to the scheme, only an object to be positioned is concerned, the higher the image resolution is, the more accurate the calculation is, and the corresponding calculation amount is not increased greatly; however, the higher the resolution of the existing scheme is, the larger the calculation amount is, and the higher the matching cost is.
6. According to the scheme, only the object to be positioned is concerned, and the image correction is not required to be carried out by utilizing the calibration parameters, so that the influence of errors of the calibration parameters is small.
Compared with the prior art, the embodiment of the invention provides an object positioning method, which comprises the steps of acquiring a left view and a right view of an object to be positioned at 2N positions respectively under a current background by utilizing a camera device, wherein N is greater than or equal to 1; determining N parallax images of the object to be positioned according to the left view and the right view at the 2N positions; the three-dimensional coordinate of the object to be positioned is calculated according to the N parallax images of the object to be positioned, the three-dimensional coordinate of the object to be positioned is calculated through the N parallax images of the object to be positioned, only the image of the object to be positioned is concerned, and the background image is not required to be concerned, so that the calculation amount for determining the three-dimensional coordinate of the object to be positioned is reduced, and the speed of image processing is improved; meanwhile, the whole image is not required to be subjected to image correction by using the calibration parameters of the camera device, so that the influence of the calibration parameters is small, and the error is small.
A second embodiment of the invention relates to an object positioning device, as shown in fig. 7, comprising at least one processor 201; and a memory 202 communicatively coupled to the at least one processor 201; the memory 202 stores instructions executable by the at least one processor 201, and the instructions are executed by the at least one processor 201 to enable the at least one processor 201 to execute the above-mentioned object positioning method.
Where the memory 202 and the processor 201 are coupled in a bus, the bus may comprise any number of interconnected buses and bridges, the buses coupling one or more of the various circuits of the processor 201 and the memory 202 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 201 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 201.
The processor 201 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory 202 may be used to store data used by the processor 201 in performing operations.
Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the above object positioning method.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.