WO2020235539A1

WO2020235539A1 - Method and device for specifying position and posture of object

Info

Publication number: WO2020235539A1
Application number: PCT/JP2020/019689
Authority: WO
Inventors: ロクランウィルソン; パーベルサフキン
Original assignee: 株式会社エスイーフォー
Priority date: 2019-05-17
Filing date: 2020-05-18
Publication date: 2020-11-26

Abstract

An embodiment of the present invention discloses a method for specifying the position and posture of an object. This method comprises: creating a virtual world including a display of an object in the real world (S310); displaying, in the virtual world, a model corresponding to the object (S315); superimposing, in the virtual world, the model on the object (S320); and specifying the position and posture of the object by comparing the object and the model (S325).

Description

Methods and devices for identifying the position and orientation of objects

The present invention relates to a method and a device for specifying the position and orientation of an object.

In order for the robot to operate the object, it is necessary to recognize the position and posture of the object. As a technique for recognizing the position / orientation of an object, a technique for extracting feature points of an object and collating those feature points with a model is known.

Patent Document 1 describes an image feature extraction device that estimates the position and orientation of an object by probabilistically obtaining the range in which the object exists in the captured image from the image features and then performing detailed collation with the model. It is disclosed.

Further, Patent Document 2 describes a distance sensor that measures the distance to an object existing in the work environment and generates range data, and the shape and position / orientation of the object by collating the generated range data with the shape data. The initial collation unit that estimates, the camera unit that generates image data that captures the image of the object, the feature extraction unit that extracts the features of the image data, and the shape and position / orientation of the object estimated by the initial collation unit as initial values. A robot device equipped with a posture estimation unit that estimates the position and orientation of an object by solving a minimization problem using the feature of the image data extracted by the feature extraction unit and the error between the shape and position and orientation of the object as an evaluation function. It is disclosed.

Further, in Non-Patent Document 1, the PnP problem (the problem of estimating the position and orientation of the calibrated camera from the three-dimensional coordinates of n points in the world coordinate system and the image coordinates in which those points are observed) is less calculated. We are proposing a method to solve with.

Japanese Patent No. 3300092 Japanese Unexamined Patent Publication No. 2010-60451

In the technology of recognizing the position and orientation of an object by collating the feature points of the object with the model, it is necessary to identify the target object existing in the captured image as a premise.

In Patent Document 1, the range in which an object exists in the captured image is stochastically obtained from the image features. In the method of Patent Document 1, it is necessary to calculate the existence probability for the entire image, so that a high calculation cost is required.

In Patent Document 2, the position and orientation of an object are used as initial values, using the shape and position and orientation of the object estimated by collating the range data generated by measuring the distance to the object existing in the work environment with the shape data. Is estimated. Since the method of Patent Document 2 is premised on measuring the distance to an object and generating range data, it cannot be used under the condition that there is no distance measuring means or distance information cannot be obtained.

Also, it is difficult to accurately estimate the position and orientation of an object even by the method proposed in Non-Patent Document 1.

In the first place, the method of recognizing an object in the captured image may not be able to correctly extract the feature points of the object if the quality of the captured image is low. In that case, the accuracy of collation between the object and the model is lowered, and the position / orientation of the object cannot be recognized correctly.

In addition to the above technology, it is also possible to use machine learning or artificial intelligence (AI) to recognize the position and orientation of the target object. However, learning requires a lot of computational cost, and correct results are not always obtained.

Also, a method of recognizing the position / orientation of an object by attaching an AR marker to the object is known. However, it is not realistic to attach AR markers to all objects that exist in the surrounding environment in the real world. Further, when the AR marker is located in the blind spot of the camera that images the object, the position / orientation of the object cannot be recognized.

According to one aspect of the present invention, it is a method of specifying the position and orientation of an object to generate a virtual world including a display of an object in the real world and to display a model corresponding to the object in the virtual world. And in a virtual world, methods are provided that include overlaying a model on an object and comparing the object to the model to identify the position and orientation of the object.

According to another aspect of the present invention, it is a device for specifying the position and orientation of an object to generate a virtual world including a display of an object in the real world, and to display a model corresponding to the object in the virtual world. A device with a processor configured to perform, superimpose a model on an object, and compare the object to the model to determine the position and orientation of the object in a virtual world. Is provided.

Other features and advantages of the present invention can be understood from the following description and accompanying drawings given exemplary and non-exhaustive.

It is a block diagram which shows one Embodiment of a robot control system. It is a figure which shows the schematic structure of one Embodiment of a robot. It is a flowchart explaining the method of specifying the position and the posture of the object in this embodiment. It is a figure which shows the object and the corresponding model. It is a figure which shows the object and the corresponding model. As another example of specifying the position / posture of the object, it is a figure which shows the example of specifying the position / posture of an assembly block. As another example of specifying the position / posture of the object, it is a figure which shows the example of specifying the position / posture of an assembly block. It is a figure which shows the robot (unmanned submersible) and the pipe which is the object of underwater work. It is a figure which shows the state of specifying the position and posture of a pipe. It is a figure which shows the state of giving an annotation (attribute information) to a pipe. It is a figure which shows the state of specifying the position / posture of the handle of a mug on a table, and the main body. Fig. (A) shows how to give an operation instruction to a virtual mug which is a virtual object in the virtual world, and Fig. (B) shows how to grasp the mug in the real world with the robot hand of a robot in the real world according to the instruction. It shows how to move it.

Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing an embodiment of a robot control system. FIG. 2 is a diagram showing a schematic configuration of an embodiment of a robot.

As shown in FIG. 1, the robot control system 1 according to the present embodiment includes a robot 100, a control unit 200 that controls the robot 100, and a control device 300 that controls the control unit 200.

First, the robot 100 in the robot control system 1 of the present embodiment will be described.

As shown in FIGS. 1 and 2, as an example, the robot 100 disclosed in the present embodiment includes at least two robot arms 120, a robot housing 140 that supports the robot arms 120, and the surrounding environment of the robot 100. It is provided with an environment sensor 160 that senses the above and a transmission / reception unit 180.

Each robot arm 120 in the present embodiment is, for example, a 6-axis articulated arm (hereinafter, also referred to as “arm”), and a robot hand (hereinafter, also referred to as “hand”) which is an end effector at the tip thereof. It has 122. The robot arm 120 includes an actuator (not shown) having a servomotor on each rotation axis. Each servomotor is connected to the control unit 200 and is configured to control its operation based on a control signal sent from the control unit 200. In the present embodiment, a 6-axis articulated arm is used as the arm 120, but the number of axes (the number of joints) of the arm can be appropriately determined according to the application of the robot 100, the functions required thereof, and the like. Further, in the present embodiment, the two-fingered hand 122 is used as the end effector, but the present invention is not limited to this, and for example, a robot hand having three or more fingers and a means of attracting by magnetic force or negative pressure are provided. A robot hand equipped with a gripping means that applies the jamming (clogging) phenomenon of powder or granular material filled in a rubber film, or a robot hand that can repeatedly grip and release the gripping object by any other means. Can be used. It is preferable that the

hands

122a and 122b are configured to be rotatable around the wrist portion thereof.

The hand 122 is equipped with a kinetic sensor that detects the amount of displacement of the hand 122, the force, acceleration, vibration, etc. acting on the hand 122. Further, it is preferable that the hand 122 is provided with a tactile sensor that detects the gripping force and the tactile sensation of the hand 122.

For example, the robot housing 140 may be installed in a state of being fixed on a mounting table (not shown), or may be installed on the mounting table so as to be rotatable via a rotation drive device (not shown). May be good. When the robot housing 140 is rotatably installed on the mounting table, the working range of the robot 100 can be expanded not only to the area in front of the robot 100 but also to the area around the robot 100. Further, the robot housing 140 can be used for vehicles, ships, submersibles, helicopters, drones, and other moving objects equipped with a plurality of wheels and endless tracks, depending on the application and usage environment of the robot 100. It may be mounted, or the robot housing 140 may be configured as part of such a moving body. Further, the robot housing 140 may have two or more legs as walking means. When the robot housing 140 has such a moving means, the working range of the robot 100 can be made wider. Depending on the application of the robot 100, the robot arm 120 may be directly fixed to a mounting table or the like without the intervention of the robot housing 140.

The environment sensor 160 senses the surrounding environment of the robot 100. The surrounding environment includes, for example, electromagnetic waves (including visible light, invisible light, X-ray, gamma ray, etc.), sound, temperature, humidity, wind velocity, atmospheric composition, etc. Therefore, the environmental sensor 160 is a visual sensor, X-ray. -It may include, but is not limited to, gamma ray sensors, auditory sensors, temperature sensors, humidity sensors, wind velocity sensors, atmospheric analyzers, and the like. Although the environment sensor 160 is shown to be integrated with the robot 100 in the figure, the environment sensor 160 does not have to be integrated with the robot 100. For example, the environment sensor 160 may be installed at a position away from the robot 100, or may be installed on a moving body such as a vehicle or a drone. Further, the environment sensor 160 preferably includes a GPS (Global Positioning System) sensor, an altitude sensor, a gyro sensor, and the like. Further, the environment sensor 160 is used as a position detection means for detecting the position of the robot 100 outdoors or indoors. In addition to the GPS sensor, WiFi positioning, beacon positioning, self-contained navigation positioning, geomagnetic positioning, sonic positioning, UWB (Ultra Wideband) Band: Ultra-wideband) It is preferable to have a configuration for positioning, visible light / invisible light positioning, and the like.

In particular, as the visual sensor, for example, a 2D camera, a depth sensor, a 3D camera, an RGB-D sensor, a 3D-LiDAR sensor, a Kinect ™ sensor, and the like can be used. The visual information obtained by the environment sensor 160 is sent to the control unit 200 and processed by the control unit 200. Other environmental information obtained by the environment sensor 160 can also be transmitted to the control unit 200 and used for analysis of the surrounding environment of the robot 100.

The transmission / reception unit 180 transmits / receives signals / information to / from the control unit 200. The transmission / reception unit 180 can be connected to the control unit 200 by a wired connection or a wireless connection, and therefore, transmission / reception of those signals / information can be performed by a wired or wireless connection. The communication protocol, frequency, and the like used for transmitting and receiving those signals and information can be appropriately selected according to the application, environment, and the like in which the robot 100 is used. Further, the transmission / reception unit 180 may be connected to a network such as the Internet.

Next, the control unit 200 in the robot control system 1 of the present embodiment will be described.

Referring to FIG. 1 again, the control unit 200 of the system 1 according to the present embodiment includes a processor 220, a storage unit 240, and a transmission / reception unit 260.

The processor 220 mainly controls the drive unit and the sensor (both not shown) of the robot arm 120 and the body 140 of the robot 100, controls the environment sensor 160, processes information transmitted from the environment sensor 160, and interacts with the control device 300. It controls the operation and transmission / reception unit 260. The processor 220 is composed of, for example, a central processing unit (CPU), an application specific integrated circuit (ASIC), an embedded processor, a microprocessor, a digital signal processor (DSP), or a combination thereof. The processor 220 may be composed of one or more processors.

Further, the processor 220 receives, for example, a control signal of the robot 100 sent from the control device 300, an operation command generated in response to the control signal, an operation of the robot 100 actually executed, and an environment sensor 160 after the operation is executed. The collected ambient environment data is stored in the storage unit 240 as data, and machine learning is executed using the data to generate learning data and store it in the storage unit 240. The processor 220 can generate an operation command by deciding an operation to be executed by the robot 100 based on the control signal of the robot 100 transmitted from the control device 300 from the next time onward with reference to the learning data. .. As described above, in the present embodiment, the control unit 200 of the robot 100 in the real world has a machine learning function locally.

The storage unit 240 is a computer program for controlling the robot 100, a computer program for processing information transmitted from the environment sensor 160, and a computer for interacting with the control device 300 as described in the present embodiment. -A program, a computer program for controlling the transmission / reception unit 260, a program for executing machine learning, and the like are stored. Preferably, the storage unit 240 stores software or a program that causes a computer to perform a process as described in this embodiment to cause a function as a control unit 200. In particular, the storage unit 240 stores a computer program that can be executed by the processor 220, including instructions that implement the methods described below with reference to FIG. 3 and the like.

Further, it is preferable that the storage unit 240 stores the data of the model of the known object as described above. Further, the storage unit 240 includes the state of each part (servo (not shown), hand 122, etc.) of the robot arm 120 of the robot 100, information transmitted from the environment sensor 160, information sent from the control device 300, control signals, and the like. It also has the role of storing at least temporarily. Further, as described above, the storage unit 240 also has a role of storing the operation instruction of the robot 100 and the operation and learning data of the robot 100 executed in response to the operation instruction. The storage unit 240 preferably includes a non-volatile storage medium that retains the storage state even when the power of the control unit 200 is turned off. For example, a hard disk drive (HDD), a solid-state storage device (SSD), or a compact storage unit 240. Optical disk storage such as disk (CD), digital versatile disk (DVD), Blu-ray disk (BD), non-volatile random access memory (NVRAM), EPROM (Rrasable Programmable Read Only Memory), non-volatile storage such as flash memory Is equipped with. The storage unit 240 may further include volatile storage such as a static random access memory (SRAM), but each computer program described above is a non-volatile (non-temporary) storage medium of the storage unit 340. Is remembered in.

The transmission / reception unit 260 transmits / receives signals / information to / from the robot 100 and transmits / receives signals / information to / from the control device 300. The control unit 200 can be connected to the robot 100 by a wired connection or a wireless connection, and therefore, transmission and reception of those signals / information can be performed by a wired or wireless connection. The communication protocol, frequency, and the like used for transmitting and receiving those signals and information can be appropriately selected according to the application, environment, and the like in which the robot 100 is used. The transmission / reception unit 260 may be connected to a network such as the Internet.

Further, the transmission / reception unit 260 transmits / receives signals / information to / from the control device 300. The control unit 200 can be connected to the control device 300 by a wired connection or a wireless connection, and therefore, transmission and reception of those signals / information can be performed by a wired or wireless connection. The communication protocol, frequency, and the like used for transmitting and receiving those signals and information can be appropriately selected according to the application, environment, and the like in which the robot 100 is used.

Although the control unit 200 is shown as being independent of the robot 100 in FIG. 1, it is not limited to that form. For example, the control unit 200 may be provided in the housing 140 of the robot 100. Further, the number of robots 100 used in this system 1 is not limited to one, and a plurality of robots 100 may be operated independently or in cooperation with each other. In this case, a single control unit 200 may control a plurality of robots 100, or a plurality of control units 200 may cooperate to control a plurality of robots 100.

Subsequently, the control device 300 in the robot control system 1 of the present embodiment will be described.

As shown in FIG. 1, the control device 300 of the system 1 according to the present embodiment includes a processor 320, a storage unit 340, an input device 350, a transmission / reception unit 360, and a display 370.

The processor 320 mainly controls the interaction with the control unit 200, the processing based on the input performed by the user via the input device 350, the control of the transmission / reception unit 360, and the display of the display 370. In particular, the processor 320 generates a control signal based on the user input input by the input device 350 and transmits it to the control unit 200. Based on the control signal, the processor 220 of the control unit 200 generates one or a plurality of operation commands for operating each drive unit (not shown) of the robot arm 120 and the body 140 of the robot 100 and the environment sensor 160. .. The processor 320 is composed of, for example, a central processing unit (CPU), an application specific integrated circuit (ASIC), an embedded processor, a microprocessor, a digital signal processor (DSP), or a combination thereof. The processor 320 may be composed of one or more processors. The processor 320 may be composed of one or more processors.

Further, the processor 320 of the control device 300 is configured to generate a UI (user interface) screen to be presented to the user and display it on the display 370. The UI screen (not shown) includes, for example, a selection button that hierarchically provides the user with a plurality of options. Further, the processor 320 generates an image or a moving image of a virtual world (simulation space) based on an image or a moving image of the real world of the surrounding environment of the robot 100 taken by the environment sensor 160 of the robot 100, and displays it on the display 370. When the processor 320 generates an image or a moving image of a virtual world based on an image or a moving image of the real world, for example, by associating a coordinate system of the real world with a coordinate system of the virtual world, the processor 320 connects the real world and the virtual world. Build a correlation. Further, the image or moving image of the real world and the image or moving image of the virtual world (simulation space) may be displayed on the display 370 at the same time. Further, the UI screen may be superposed on the image or moving image of the surrounding environment of the robot 100 or the image or moving image of the virtual world. The virtual world (simulation space) image or moving image generated based on the real world image or moving image of the surrounding environment of the robot 100 also includes an object existing in the surrounding environment of the robot 100. By building a correlation between the real world and the virtual world as the processor 320 generates a virtual world image or video based on the real world image or video, the user in the virtual world, as described in detail below. It is possible to make a change in the real world based on the operation of, and to reflect the change in the real world in the virtual world.

Further, the processor 320 of the control device 300 obtains or generates a model corresponding to an object included in an image or a moving image in a virtual world (simulation space). There are three possible methods for obtaining or generating a model: 1) obtaining information on a ready-made model, 2) scanning an object to generate a model, and 3) creating a model of an object independently. Be done.

The mode of "obtaining the ready-made information of the model" of 1) deals with the case where the model corresponding to the scanned object is available. In this aspect, the processor 320 of the control device 300 refers to the model of the object stored in the storage unit 340 or the model of the object existing on the network to which the control device 300 is connected to provide visual information. Get the model corresponding to the object contained in it.

In addition, the aspect of 2) "scanning an object to generate a model" can be dealt with when a model corresponding to the scanned object is not available. In this aspect, for example, the user uses the input device 350 to combine various primitive shape elements in the UI screen to generate a model corresponding to the scanned object. Primitive shape elements include, for example, elements such as prisms of arbitrary angle, pyramids of arbitrary angle, cylinders, cones, and spheres. Further, the processor 320 may allow the user to draw an arbitrary shape in the UI screen and add it as a primitive shape element. The user selects these various shape elements in the UI screen, changes the dimensions of each part of the selected shape elements as appropriate, and combines those elements according to the image of the scanned object to correspond to the scanned object. You can generate a model to do. When generating a model using these elements, it is also possible to represent dents and holes of objects.

Further, in the aspect of 3) "creating a model of an object independently", the user combines various primitive shape elements in the UI screen to generate a model of an arbitrary object without scanning the object. It is possible to deal with the case. The user's operation for generating the model in this aspect is the same as the operation described in the above aspect 2). By creating a model in advance and accumulating it in the system 1, when it becomes necessary to operate the real object corresponding to the model in the real world, the model can be used as the corresponding object in the virtual world. It becomes possible to associate.

The processor 320 reproduces the model corresponding to the object by, for example, computer graphics, and displays it in the virtual world on the display 370. The data of the model obtained from the outside or the model generated as described above is stored in the storage unit 340 of the control device 300.

The storage unit 340 is interactively performed by the user on the UI screen via a program for causing the processor 320 to execute the operation described in the present embodiment, a computer program that interacts with the control unit 200, and an input device 350. It stores a computer program that performs processing based on input, a computer program that controls the transmission / reception unit 260, a computer program that displays the display 370, and the like. Preferably, the storage unit 340 stores software or a program that causes the computer to perform an operation described later to generate a function as the control device 300. In particular, the storage unit 340 stores a computer program that can be executed by the processor 320, including instructions that implement the methods described below with reference to FIG. 4 and the like.

Further, the storage unit 340 includes an image or moving image of the surrounding environment of the robot 100 taken by the environment sensor 160 of the robot 100 and sent to the control device 300 via the control unit 200, and an image or moving image of the surrounding environment of the robot 100. It is possible to at least temporarily store an image or moving image of a virtual world (simulation space) generated by the processor 320 based on the moving image. The storage unit 340 of the control device 300 is also preferably composed of a non-volatile storage medium that retains the storage state even when the power of the control device 300 is turned off. For example, a hard disk drive (HDD) or a solid-state storage device. (SSD), optical disk storage such as compact disk (CD), digital versatile disk (DVD), Blu-ray disk (BD), non-volatile random access memory (NVRAM), EPROM (Rrasable Programmable Read Only Memory), flash memory It is equipped with non-volatile storage such as. The storage unit 340 may further include volatile storage such as a static random access memory (SRAM), but each computer program described above is a non-volatile (non-temporary) storage medium of the storage unit 340. Is remembered in.

Further, the storage unit 340 also functions as a database of the system 1, and as described in relation to the concept of the present invention, the operation data of the robot 100 in the real world (the operation generated by the control unit 200) operated based on the control signal. (Including commands) and the ambient environment data indicating the operation result detected by the environment sensor 160 is stored.

As the input device 350, for example, a keyboard, a mouse, a joystick, or the like can be used, and further, a device called a tracker that can track the position and posture using infrared rays or the like and has a trigger button or the like is used. You can also. When the display 370 includes a touch panel type display device, the touch panel can be used as an input device. Furthermore, when the display 370 is a head-mounted display used as a display device for VR (virtual reality), AR (augmented reality), MR (mixed reality), etc., and has a user's line-of-sight tracking function, The line-of-sight tracking function can be used as an input device. Alternatively, even a device having a line-of-sight tracking function but not a display can use the line-of-sight tracking function as an input device. Furthermore, a voice input device can also be used as an input device. These are exemplified as an example of the input device 350, and the means that can be used for the input device 350 is not limited to these. Further, the above-mentioned means may be arbitrarily combined and used as the input device 350. By using the input device 350 as described above, the user can select, for example, select a selection button, input characters, or take a picture of the robot by the environment sensor 160 of the robot 100 on the UI screen displayed on the display 370. An object contained in the image or moving image of the surrounding environment of 100, or an image or image of a virtual world (simulation space) generated based on the image or moving image of the surrounding environment of the robot 100 taken by the environment sensor 160 of the robot 100. You can select virtual objects included in the video.

The transmission / reception unit 360 transmits / receives signals / information to / from the control unit 200. As described above, the control device 300 can be connected to the control unit 200 by a wired connection or a wireless connection, and therefore, transmission and reception of these signals / information can be performed by a wired or wireless connection. The communication protocol, frequency, and the like used for transmitting and receiving the signal and information can be appropriately selected according to the application and environment in which the system 1 is used. Further, the transmission / reception unit 360 may be connected to a network such as the Internet.

The display 370 is used as a display device such as a display monitor, a computer / tablet device (including a device equipped with a touch panel type display), VR (virtual reality) / AR (augmented reality), or MR (composite reality). Any form of display device such as a head-mounted display or a projector can be used.

In particular, when a head-mounted display is used as the display 370, the head-mounted display provides an image or a moving image in which the left and right eyes of the user have parallax, thereby causing the user to perceive a three-dimensional image or moving image. Can be done. Further, when the head-mounted display has a motion tracking function, it is possible to display an image or a moving image according to the position and direction of the head of the user wearing the head-mounted display. Furthermore, when the head-mounted display has a user's line-of-sight tracking function as described above, the line-of-sight tracking function can be used as an input device.

In the following description of the present embodiment, the processor 320 of the control device 300 is an image of a virtual world (simulation space) based on a real space image or a moving image of the surrounding environment of the robot 100 taken by the environment sensor 160 of the robot 100. Alternatively, a head-mounted display that generates a moving image and is used as a display device for VR (virtual reality), AR (augmented reality), MR (mixed reality), etc. is used as the display 370, and infrared rays or the like is used as the input device 350. An example will be described in which a tracker capable of tracking the position and posture and having a trigger button or the like is used.

Next, with reference to FIGS. 3 to 5, as an example of an operation for specifying the position / posture of an object, a scenario for specifying the position / posture of a bolt and a nut will be described as an example.

FIG. 3 is a flowchart illustrating a method of specifying the position / posture of the object in the present embodiment. 4 and 5 are diagrams showing objects in the virtual world and corresponding models.

In this example, first, as shown in step S305 of FIG. 3, the ambient environment information of the robot 100 in the real world obtained by the environment sensor 160 of the robot 100 is transmitted to the control device 300 via the control unit 200. Send. The visual information may be a single still image, a plurality of images, or a moving image, and preferably includes depth information. The control device 300 may store the transmitted visual information in the storage unit 340.

Next, as shown in step S310 of FIG. 3, the processor 320 of the control device 300 generates a virtual world (simulation space) that reproduces the surrounding environment of the robot 100 based on the visual information of the control device 300. It is displayed on the display 370. In the virtual world, in addition to the scenery around the robot 100 in the real world, objects in the real space existing at least in an area accessible to the robot 100 are displayed. The object may be a two-dimensional or three-dimensional image of a real-world object obtained by a visual sensor, a depth map, a point cloud, or the like. Alternatively, it may be represented by computer graphics that represent the object. In this example, bolts (FIG. 4 (a)) and nuts (FIG. 5 (a)) are displayed as objects.

Subsequently, the user uses the input device 350 of the control device 300 to select a model corresponding to the target object for which the position / posture is specified from the menu on the screen displayed on the display 370. Then, the processor 320 of the control device 300 reads the data of the model from the storage unit 340 and displays the model in the virtual world displayed on the display 370 (step S315 in FIG. 3). In this example, the bolt model 40 (FIG. 4 (b)) and the nut model 50 (FIG. 5 (b)) are displayed in the virtual world.

As described above, the model can be obtained or obtained by three modes: 1) obtaining information on a ready-made model, 2) scanning an object to generate a model, and 3) creating a model of the object independently. Will be generated.

Subsequently, as shown in step S320 of FIG. 3, the user superimposes the model generated as described above on the corresponding object in the virtual world. In the bolt example shown in FIG. 4, the user uses the input device 350 to select the model 40 shown in FIG. 4B on the screen displayed on the display 370 and move the model 40 within the screen. Then, the objects (FIG. 4 (a)) shown in the same screen are overlapped so that their positions and postures substantially match.

Then, the processor 320 compares the three-dimensional shape data (for example, the edge and / or the feature point of the three-dimensional shape) of the object (FIG. 4A) with the three-dimensional shape data of the model 40, and the object (FIG. 4). The position and orientation of the model 40 are corrected so that the contour of the three-dimensional shape of (a)) and the contour of the three-dimensional shape of the model 40 match. As a result, the object displayed in the virtual world (FIG. 4A) is not a mere object occupying a certain volume space, but an object (bolt) corresponding to the model 40 selected by the user. , It is specified in the control device 300 that the object exists in the virtual world in the corrected position and orientation of the model 40 (step S325 in FIG. 3). The method for comparing the object and the model is not limited to the above, and any other method can be used.

In this way, the processor 320 of the control device 300 recognizes that the object (FIG. 4A) is the object corresponding to the model 40, and sets the virtual object corresponding to the object specified as described above, for example. It is reproduced by computer graphics and displayed so as to be superimposed on the object on the display 370. The virtual object may be reproduced using, for example, an image of a model (computer graphics). The user can use the input device 350 to perform operations such as moving the virtual object in the virtual world displayed on the display 370.

The virtual object can be moved by the user using the input device 350 in the virtual world displayed on the display 370. For example, when using a tracker as an input device 350, by pointing a virtual object on the tracker and then pressing the trigger button, the virtual object will be adjusted to the movement of the tracker in the virtual world while the trigger button is pressed. Can be moved freely. Then, by releasing the trigger button after moving the virtual object to a desired position / posture in the virtual world, the movement operation of the virtual object can be completed. The user can also operate two virtual objects at the same time in the virtual world by operating the two trackers with both hands at the same time.

Further, in the virtual world on the display 370, it is possible to convert the hand 122 of the robot 100 into an object and display it as a virtual hand, and operate the virtual hand with the tracker to move the virtual object. For example, by pointing a virtual hand with a tracker and pressing the trigger button, the virtual hand is moved. Then, by aligning the claw or finger part of the virtual hand with the virtual object to be moved and releasing the trigger button being pressed, or by pressing another trigger button, the virtual object to be moved is moved with the virtual hand. Allows you to move while grasping. After that, for example, by moving the virtual hand with the tracker while pressing the trigger button, the virtual object can be moved while the virtual object is grasped by the virtual hand. Then, by releasing the trigger button after moving the virtual object to a desired position / posture in the virtual world, the movement operation of the virtual hand and the virtual object can be completed.

The operation of virtual objects and virtual hands in the virtual world using the input device 350 as described above, and the display thereof on the display 370 are controlled by the processor 320.

Further, the processor 320 of the control device 300 generates a control signal for moving a real object corresponding to the virtual object to the hand 122 of the robot 100 in the real world in response to the operation of the virtual hand as described above. The control signal is transmitted from the control device 300 to the control unit 200. The control unit 200 that has received the control signal performs motion planning of the robot 100 based on the received control signal and the surrounding environment information detected by the environment sensor 160 of the robot 100, and generates an operation command to be executed by the robot 100. Then, the robot 100 is operated based on the operation command.

As described above, according to the present embodiment, the position of the object in the virtual world by superimposing the model corresponding to the object on the object in the virtual world corresponding to the object in the real world. And the posture is specified. As described in the prior art column, in the technique of recognizing the position / orientation of an object by collating the feature points of the object with the model, it is necessary to identify the target object existing in the captured image as a premise. On the other hand, according to the present embodiment, the user specifies the target object in the virtual world. Therefore, the calculation cost can be suppressed as compared with the case where the target object and its position / orientation are specified by image processing.

Furthermore, even if the image quality of the target object in the captured image or video is low and unclear, or even if a part of the target object is missing, what is the target object for a person? Furthermore, in many cases, it is possible to recognize the posture of the target object. Therefore, according to the present embodiment, even if the object displayed in the virtual world is somewhat unclear, the user recognizes the object based on his / her own perception, selects the corresponding model, and selects the corresponding model. The model can be superimposed on the object to identify the position and orientation of the object. If the model corresponding to the object is selected, the processor 320 of the control device 300 uniquely identifies the position and orientation of the object by comparing the information such as the contour, feature point, and edge of the object and the model. It is possible. In this regard, even with image recognition technology that incorporates machine learning and AI, there is a limit to recognition when the object to be recognized is unclear. Therefore, according to this embodiment, human cognitive ability is incorporated. There is an advantage that it is possible to recognize an object and its position / orientation. For example, according to the present embodiment, when the recognition of the position / posture of the object to be recognized fails as a result of recognizing the position / posture of the object to be recognized by using the image recognition technology incorporating machine learning or AI, the person according to the present embodiment. Can intervene to quickly correct the recognition result of the position / orientation of the object.

Since the virtual world is generated by the control device 300 based on the real world, the correlation between the coordinate system of the virtual world and the coordinate system of the real world is known in the control device 300. Therefore, if the position and orientation of the object in the virtual world are specified, the control device 300 can specify the position and orientation of the object in the real world.
Next, with reference to FIGS. 6 and 7, as another example of specifying the position / orientation of the object, an example of specifying the position / orientation of the assembly block will be shown.

FIG. 6A shows a state in which a plurality of assembly blocks are placed on a table in the real world. In this example, the surrounding environment including these multiple assembly blocks is imaged using a depth camera DC (eg, Intel RealSense ™ depth camera) whose back is visible at the bottom of FIG. 6 (a). Will be done. The depth camera DC corresponds to the environment sensor 160 of the robot 100 of the present embodiment described with reference to FIGS. 1 and 2.

FIG. 6B shows a virtual world generated based on the image data captured by the depth camera. A plurality of blocks corresponding to each assembly block in the real world of FIG. 6 (a) are displayed in the virtual world of FIG. 6 (b). This virtual world is generated in the control device 300 as described above. At this stage, those blocks are recognized by the control device 300 as merely objects occupying a certain volume space, and the positions and orientations of the blocks have not yet been specified.

Also, it can be seen that some blocks are displayed as if some of the blocks were missing. This is because when the block is imaged by the depth camera DC arranged at the fixed position, sufficient image information on the depth side cannot be obtained. If a part of the object is missing in this way, it is difficult to correctly recognize the position and orientation of the object even with image processing technology using machine learning or AI. On the other hand, in the present embodiment, the object and its position / posture can be recognized by incorporating the human cognitive ability.

Note that FIG. 6B also displays a screen showing the types and numbers of blocks displayed in the virtual world and how to combine them.

FIG. 7 shows an operation of superimposing a model on a block in a virtual world to specify the position and orientation of the block.

FIG. 7A shows a state in which the model 70 corresponding to the block blc is moved toward the block blc. Prior to this, the user uses the input device 350 to select the model 70 in the virtual world displayed on the display 370. When using a tracker as the input device 350, by pointing the model 70 on the tracker and then pressing the trigger button, the model 70 can be freely moved in the virtual world according to the movement of the tracker while the trigger button is pressed. be able to. Then, by moving the model 70 to a desired position / posture in the virtual world and then releasing the trigger button, the movement operation of the model 70 can be completed.

FIG. 7B shows a state in which the model 70 is superposed on the block blc so that the positions and postures substantially match. In this state, the user releases the trigger button of the tracker to end the movement operation of the model 70.

Then, as an example, the processor 320 of the control device compares the three-dimensional shape data of the block blc (for example, an edge and / or the feature point) with the three-dimensional shape data of the model 70, and outlines the three-dimensional shape of the block blc. The position / orientation of the model 70 is corrected so that the contour of the three-dimensional shape of the model 70 matches the contour of the model 70. After the position / orientation of the model 70 is corrected, the contour of the three-dimensional shape of the block blc and the contour of the three-dimensional shape of the model 70 match. The controller 320 of the control device highlights this matched three-dimensional shape contour (FIG. 7 (c)).

In this way, the position and orientation of the block blc in the virtual world are specified. Further, the processor 320 of the control device identifies the position and orientation of the real block corresponding to the block blc in the virtual world in the real world based on the correlation between the coordinate system of the virtual world and the coordinate system of the real world. be able to. As a result, the system 1 recognizes the position and orientation of the block in the real world, and the robot 100 can perform a desired operation on the block.

(Other objects)
Here, an example of another object that specifies the position / posture by the method of the present embodiment is shown.
[First example]

FIG. 8 is a diagram showing a robot (unmanned submersible) used in this example and a pipe to be worked underwater. As shown in FIG. 8, the robot 100 used in this example has the form of an unmanned submersible, and includes a robot arm 120 having a robot hand 122 at its tip and a housing 140 in which the robot arm 120 is installed. have. In the housing 140, the robot 100 moves in the horizontal direction (X-axis direction), the front-back direction (Y-axis direction), and the vertical direction (Z-axis direction) in water, and rotates around each XYZ axis. There are multiple thrusters (not shown) that allow it. These thrusters consist of, for example, propellers that are rotated by an electric motor. Although not explicitly shown in the drawing, the housing 140 is provided with an environment sensor and a transmission / reception unit described with reference to FIG. 2 and the like. In particular, the housing 140 is provided with at least a visual sensor as an environment sensor, whereby visual information on the surrounding environment of the robot 100 (particularly, the environment including the robot arm 120 and the hand 122 in front of the robot 100) can be acquired. It is possible. Since other configurations of the robot (unmanned submersible) 100 used in this example are the same as those described above with reference to FIG. 2, detailed description thereof will be omitted here. The system configuration and control method used in this example are the same as those described above. In this example, the characteristic points regarding the task of grasping the underwater pipe with the robot hand at the tip of the robot arm of the robot in the form of an unmanned submersible will be described.

Note that FIG. 8 shows a virtual world generated based on the environmental information acquired by the environmental sensor of the robot (unmanned submersible) 100. The shape and function of each part of the robot (unmanned submersible) 100 is modeled and pre-stored in at least the storage unit 340 of the control device 300, and is therefore known in the system 1. Therefore, the modeled robot (unmanned submersible) 100 is displayed in the virtual world. On the other hand, the pipe 60 in the virtual world is displayed in a state of being reproduced based on the environmental information acquired by the environmental sensor of the robot (unmanned submersible) 100. Since the pipe 60 is photographed only from a specific direction by the environment sensor of the unmanned submersible 100, it is reproduced in a shape that can be recognized from the photographed direction, and the portion on the opposite side is not reproduced. In FIG. 8, the shape of the left side portion of the pipe 60 in the drawing is reproduced, but the right side portion of the pipe 60 in the drawing is displayed in a missing state.

FIG. 9 is a diagram showing how the position / orientation of the pipe shown in FIG. 8 is specified in the virtual world.

FIG. 9 shows a virtual world generated by the controller 300 according to steps S305 and S310 shown in FIG. In this example, the user creates a model corresponding to the object according to the above-mentioned "2) Scanning the object to generate a model" for the pipe 60 displayed in the virtual world on the display 370. , Characterize the position / orientation of the object using the model.

More specifically, first, the scanned object, the pipe 60, is displayed on the display 370 as shown in FIG. 9A. Next, the user generates a cylindrical model using the tracker 350 (the corresponding virtual tracker 350_vr is displayed in FIG. 9) in the UI screen displayed on the display 370 (in FIG. 9). FIG. 9 (b), move this so that it overlaps the pipe 60 in the virtual world displayed on the display 370 (FIG. 9 (c)), and adjust the diameter and length (FIG. 9 (b)). d)). In this way, the display of the model corresponding to the pipe 60 which is an object (step S315 in FIG. 3) and the superposition of the model on the object (step S320 in FIG. 3) are performed.

Subsequently, the processor 320 compares the three-dimensional shape data of the pipe 60, which is an object (for example, the edge and / or the feature point of the three-dimensional shape) with the three-dimensional shape data of the model, and determines the three-dimensional shape of the pipe 60. Correct the position and orientation of the model so that the contour matches the contour of the three-dimensional shape of the model. As a result, the pipe 60 displayed in the virtual world is not a mere object occupying a certain volume space, but an object (pipe 60) corresponding to the model selected by the user, and the pipe 60 is virtual. It is specified in the control device 300 that the model exists in the corrected position and orientation in the world (step S325 in FIG. 3).

The pipe 60 whose position and orientation are specified in this way is displayed in the virtual world as a virtual pipe 60_vr by, for example, a computer graphics representation representing a model. Thereby, for example, it is possible to instruct the operation of grasping the virtual object (virtual pipe 60_vr) of the pipe 60 with the robot hand 122 of the robot (unmanned submersible) 100.

[Second example]
FIG. 10 is a diagram showing a mug 80 placed on a table.

The mug 80 has a handle 82 and a main body 84. The handle 82 of the mug 80 can be gripped by the hand 122 provided on the arm 120 of the robot 100 described with reference to FIGS. 1 and 2. When the hand 122 grips the handle 82, the robot arm 120 For example, the mug 80 can be moved on the table.

FIG. 11 is a diagram showing how the position / orientation of the mug 80 on the table shown in FIG. 10 is specified in the virtual world. FIG. 11 shows a virtual world created by controller 300 according to steps S305 and S310 shown in FIG.

In this example, the user creates a model corresponding to the object according to the above-mentioned "2) Scanning the object to generate a model" for the mug 80 displayed in the virtual world on the display 370. , Characterize the position / orientation of the object using the model.

More specifically, first, the scanned object, the mug 80, is displayed on the display 370 as shown in FIG. 11A. Next, the user generates a cylindrical model using the tracker 350 (the corresponding virtual tracker 350_vr is displayed in FIG. 11) in the UI screen displayed on the display 370. This is moved so as to be superimposed on the main body 84 of the mug 80 in the virtual world displayed on the display 370 (FIG. 11 (a)), and the diameter and length are adjusted (FIG. 11 (b)). In this way, the display of the model corresponding to the main body 84 of the mug 80, which is an object (step S315 in FIG. 3), and the superposition of the model on the object (step S320 in FIG. 3) are performed.

Subsequently, the processor 320 compares the three-dimensional shape data of the main body 84, which is an object (for example, the edge and / or the feature point of the three-dimensional shape) with the three-dimensional shape data of the model, and determines the three-dimensional shape of the main body 84. Correct the position and orientation of the model so that the contour matches the contour of the three-dimensional shape of the model. As a result, the main body 84 displayed in the virtual world is not a mere object occupying a certain volume space, but an object (main body 84) corresponding to the model selected by the user, and further, the main body 84 is virtual. It is specified in the control device 300 that the model exists in the corrected position and orientation in the world (step S325 in FIG. 3).

Similarly, for the handle 82 of the mug 80, the tracker 350 (virtual tracker 350_vr shown in FIG. 11) is used to generate a rectangular parallelepiped model corresponding to the handle 82 of the mug 80, which is displayed on the display 370. It is moved so as to be overlapped with the handle 82 of the mug 80 in the virtual world, and the height, width, and depth dimensions are adjusted (FIG. 11 (d)). In this way, the display of the model corresponding to the handle 82 of the mug 80 which is the object (step S315 in FIG. 3) and the superposition of the model on the object (step S320 in FIG. 3) are performed.

Subsequently, the processor 320 compares the three-dimensional shape data of the handle 82, which is an object (for example, the edge and / or the feature point of the three-dimensional shape) with the three-dimensional shape data of the model, and determines the three-dimensional shape of the handle 82. Correct the position and orientation of the model so that the contour matches the contour of the three-dimensional shape of the model. As a result, the handle 82 displayed in the virtual world is not a mere object that occupies a certain volume space, but an object (handle 82) corresponding to the model selected by the user, and the handle 82 is virtual. It is specified in the control device 300 that the model exists in the corrected position and orientation in the world (step S325 in FIG. 3).

Next, the user performs an input operation on the UI screen displayed on the display 370 to the effect that the handle 82 of the mug 80 and the main body 84 are integrated by using the tracker 350. As a result, the processor 320 of the control device 300 recognizes that the handle 82 and the main body 84 are integrated at the specified positions and postures.

The mug 80 whose position and posture are specified in this way is displayed in the virtual world as a virtual mug by, for example, a computer graphics representation representing a model of the handle 82 and the main body 84. As a result, for example, an operation of grasping and moving the virtual object of the mug 80 with the robot hand 122 of the robot 100 in the virtual world is instructed in the virtual world (FIG. 12A), and the mug 80 in the real world is displayed. It can be gripped and moved by the robot hand 122 of the robot 100 in the real world (FIG. 12 (b)).

In this example, an example of specifying the positions and postures of the handle 82 of the mug 80 and the main body 84 has been described. However, for example, if the purpose is to hold the handle 82 with the robot hand 122 of the robot 100 and move the mug 80. , The position / posture may be specified only for the handle 82 which is a part of the mug 80.

Various examples have been described for objects whose positions and postures can be specified by the method of this embodiment, but the objects whose positions and postures can be specified by the method of this embodiment are not limited to the above objects, and can be any object. On the other hand, it is possible to specify the position and posture.

In the description of the present embodiment, the robot 100 having an arm having a hand is illustrated as the form of the robot, but the form of the robot controlled by the present invention is not limited to this, and for example, the form of the robot is a vehicle, a ship, or the like. It may be a submersible, a drone, a construction machine (excavator, bulldozer, excavator, crane, etc.) or the like. In addition to the environments and applications described in this embodiment, the environment and applications in which robots that can be operated using the system of this embodiment are used include space development, mining, mining, resource extraction, agriculture, forestry, and water. There are a wide variety of environments and applications such as industry, livestock industry, search and rescue, disaster support, disaster recovery, humanitarian support, explosives disposal, removal of obstacles on the route, disaster monitoring, and crime prevention monitoring. The objects operated by the robot of the present embodiment vary depending on the environment and application in which the robot is used. As an example, when a shovel car is used as a robot, excavated soil, sand, etc. are also objects.

Although the present invention has been described above through the embodiments of the invention, the above-described embodiments do not limit the invention according to the claims. In addition, a form combining the features described in the embodiments of the present invention may also be included in the technical scope of the present invention. Furthermore, it will be apparent to those skilled in the art that various changes or improvements can be made to the above embodiments.

1 Robot control system 100 Robot 200 Control unit 300 Control device

Claims

A method of identifying the position and orientation of an object,
Creating a virtual world that includes the display of said objects in the real world,
Displaying the model corresponding to the object in the virtual world
In the virtual world, overlaying the model on the object
To identify the position and orientation of the object by comparing the object with the model,
How to include.
The method according to claim 1, wherein the comparison between the object and the model includes comparing the three-dimensional shape data of the object with the three-dimensional shape data of the model.
The method according to claim 1, further comprising correcting the position and orientation of the model so that the contour of the three-dimensional shape of the object and the contour of the three-dimensional shape of the model match.
A device that identifies the position and orientation of an object.
Creating a virtual world that includes the display of said objects in the real world,
Displaying the model corresponding to the object in the virtual world
In the virtual world, overlaying the model on the object
To identify the position and orientation of the object by comparing the object with the model,
A device with a processor configured to run.
The apparatus according to claim 4, wherein the comparison between the object and the model includes comparing the three-dimensional shape data of the object with the three-dimensional shape data of the model.
The processor is further configured to perform correction of the position and orientation of the model so that the contour of the three-dimensional shape of the object and the contour of the three-dimensional shape of the model match. 4. The device according to 4.
A computer program that can be executed by a processor and includes an instruction that implements the method according to any one of claims 1 to 3.
A non-transitory computer-readable medium comprising a computer program stored on the medium and capable of being executed by a processor, including instructions for performing the method according to any one of claims 1-3. , Non-temporary computer-readable medium.