WO2019138834A1

WO2019138834A1 - Information processing device, information processing method, program, and system

Info

Publication number: WO2019138834A1
Application number: PCT/JP2018/047022
Authority: WO
Inventors: 誠冨岡; 鈴木　雅博; 小林　俊広; 片山　昭宏; 藤木　真和; 小林　一彦; 小竹　大輔; 修一三瓶; 智行上野; 知弥子中島; 聡美永島
Original assignee: キヤノン株式会社
Priority date: 2018-01-12
Filing date: 2018-12-20
Publication date: 2019-07-18
Also published as: JP7341652B2; JP2019125345A

Abstract

This information processing device: accepts input of image information acquired by an image capturing unit which is mounted on a moving body and in which each light receiving unit on an image capturing element consists of at least two light receiving elements; holds map information; acquires a position and attitude of the image capturing unit on the basis of the image information and the map information; and obtains a control value for controlling the movement of the moving body on the basis of the position and attitude, acquired by an acquiring means.

Description

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, PROGRAM, AND SYSTEM

The present invention relates to technology for performing movement control of a mobile.

For example, there are moving objects such as a guided vehicle (AGV (Automated Guided Vehicle)) and an autonomous mobile robot (AMR (Autonomous Mobile Robot)). When running them in an environment such as a factory or a distribution warehouse, for example, in order to stably control the movement of the moving body, a tape is attached to the floor as in Patent Document 1 and a sensor mounted with the tape on the moving body I was running while detecting.

Japanese Patent Application Publication No. 2010-33434

However, in the technology of Patent Document 1, since it is necessary to restick the tape every time the flow line is changed by changing the layout of an object in the environment in which the moving body travels, it takes time and effort. It is required to reduce the time and effort and stably run the moving body.

The present invention has been made in view of the above problems, and an object thereof is to provide an information processing apparatus which stably performs movement control of a mobile body. Moreover, it aims at providing the method and program.

An information processing apparatus according to the present invention has the following configuration.

An input unit that receives an input of image information acquired by an imaging unit that is mounted on a moving body and in which each light receiving unit on the imaging element is configured by two or more light receiving elements;
Holding means for holding map information;
Acquisition means for acquiring the position and orientation of the imaging unit based on the image information and the map information;
Control means for obtaining a control value for controlling the movement of the movable body based on the position and orientation acquired by the acquisition means.

According to the present invention, movement control of the moving body can be stably performed.

The accompanying drawings are included in the specification, constitute a part thereof, show embodiments of the present invention, and are used together with the description to explain the principle of the present invention.

FIG. 2 is a diagram for explaining a system configuration in the first embodiment. FIG. 2 is a diagram for explaining a functional configuration in the first embodiment. FIG. 7 is a diagram for explaining an imaging element D150 included in the imaging unit 110. FIG. 7 is a diagram for explaining an imaging element D150 included in the imaging unit 110. FIG. 7 is a diagram for explaining an imaging element D150 included in the imaging unit 110. FIG. 6 is a view showing an example of images 152a to 154d captured by the imaging unit 110. 3 is a flowchart showing the flow of processing of the device of the first embodiment. FIG. 2 is a diagram showing a hardware configuration of the device of Embodiment 1. 10 is a flowchart showing a procedure of correction processing of visual information using motion stereo in the second embodiment. FIG. 7 is a diagram for explaining a functional configuration in a third embodiment. FIG. 13 is a diagram for explaining a functional configuration in a fourth embodiment. 16 is a flowchart illustrating a procedure of correction processing of visual information using a measurement result of the three-dimensional measurement device in the fourth embodiment. 16 is a flowchart showing a processing procedure of object detection and calculation of position and orientation in the fifth embodiment. 16 is a flowchart showing a processing procedure of semantic area division of visual information in the sixth embodiment. The figure which shows an example of GUI which presents display information. FIG. 18 is a diagram for explaining a functional configuration in an eighth embodiment. The flowchart which shows the flow of processing of the device of execution form 8. The figure which shows an example of GUI which presents display information.

Hereinafter, embodiments will be described with reference to the drawings. In addition, the structure shown in the following embodiment is only an example, and this invention is not limited to the illustrated structure.

Embodiment 1
In this embodiment, movement control of a mobile unit referred to as a guided vehicle (AGV (Automated Guided Vehicle)) or an autonomous mobile robot (AMR (Autonomous Mobile Robot)) will be described. Hereinafter, although AGV is demonstrated to an example as a mobile, a mobile may be AMR.

FIG. 1 shows a system configuration diagram in the present embodiment. The information processing system 1 in the present embodiment includes a plurality of mobile units 12 (12-1, 12-2,...), A process management system 14 and a mobile unit management system 13. The information processing system 1 is a distribution system, a production system, and the like.

The plurality of mobile bodies 12 (12-1, 12-2,...) Are transportation vehicles (AGV (Automated Guided Vehicle)) that transport objects in accordance with the schedule of processes determined by the process management system. A plurality of mobile units move (run) within the environment.

The process management system 14 manages the process performed by the information processing system. For example, it is MES (Manufacturing Execution System) which manages the process in a factory or a distribution warehouse. It communicates with the mobile management system 3.

The mobile management system 13 is a system that manages mobiles. It communicates with the process control system 12. In addition, communication (for example, Wi-Fi communication) is also performed with mobiles, and operation information is bidirectionally transmitted and received.

FIG. 2 is a diagram showing an example of a hardware configuration of the mobile unit 12 including the information processing apparatus 10 in the present embodiment. The information processing apparatus 10 includes an input unit 1110, a calculation unit 1120, a holding unit 1130, and a control unit 1140. The input unit 1110 is connected to the imaging unit 110 mounted on the moving body 12. The controller 1140 is connected to the actuator 120. In addition to these, a communication device (not shown) communicates information with the mobile management system 3 in a bi-directional manner, and inputs / outputs to / from various means of the information processing apparatus 10. However, FIG. 2 is an example of a device configuration.

FIG. 3 is a diagram for explaining an imaging device D 150 provided in the imaging unit 110. In the present embodiment, the imaging unit 110 internally includes an imaging device D150. As shown in FIG. 3A, a large number of light receiving units D151 are arranged in a lattice shape inside the imaging device D150. FIG. 3A shows four light receiving units. The micro lens D153 is provided in the upper surface in each light-receiving part D151, and it can collect now efficiently. The conventional imaging device includes one light receiving element for one light receiving unit D151. However, in the imaging device D150 included in the imaging unit 110 in the present embodiment, each light receiving unit D151 includes a plurality of light receiving devices D152. Is equipped.

FIG. 3B shows one light receiving unit D 151 as viewed from the side. As shown in FIG. 3B, two light receiving elements D 152 a and 152 b are provided in one light receiving unit D 151. The individual light receiving elements D152a and D152b are independent of each other, and the charge accumulated in the light receiving element D152a does not move to the light receiving element D152b, and conversely, the charge accumulated in the light receiving element D152b moves to the light receiving element D152a There is nothing to do. Therefore, in FIG. 3B, the light receiving element D 152 a receives the light flux incident from the right side of the microlens D 153. On the other hand, the light receiving element D 152 b receives the light flux incident from the left side of the microlens D 153.

The imaging unit 110 can select only the charge accumulated in the light receiving element D 152 a to generate the image D 154 a. At the same time, the imaging unit 110 can select only the charge accumulated in the light receiving element D 152 b to generate the image D 154 b. The image D154a is generated by selecting the light from the right side of the microlens 153, and the image D154b is generated by selecting only the light from the left side of the microlens D153. Therefore, as shown in FIG. It is an image taken from

Further, the imaging unit 110 forms an image from each light receiving unit D 151 using the charges accumulated in both of the light receiving elements D 152 a and D 152 b. As in the case of using a conventional imaging device, an image D154e (not shown) which is an image captured from a certain viewpoint is obtained. The imaging unit 110 can simultaneously capture the images D154a and D154b having different shooting viewpoints and the conventional image 154e according to the principle described above.

Note that each light receiving unit D151 may include more light receiving elements D152, and an arbitrary number of light receiving elements D152 can be set. For example, FIG. 3C shows an example in which four light receiving elements D152a to D152d are provided inside the light receiving part D151.

The imaging unit 110 may perform a corresponding point search from the pair of images D154a and D154b to calculate a parallax image D154f (not illustrated), and may further calculate a three-dimensional shape of an object by a stereo method based on the parallax images. it can. Corresponding point search and stereo methods are known techniques, and various methods can be applied. In the corresponding point search, a template matching method of searching for a similar template using several pixels around each pixel of the image as a template, or extracting edge or corner feature points from the gradient of the brightness information of the image, Use a method to search for similar points. In the stereo method, the relationship between coordinate systems of two images is derived, a projective transformation matrix is derived, and a three-dimensional shape is calculated. The imaging unit 110 has a function of outputting an image D154a, an image D154b, a parallax image D154f, a depth map D154d obtained by the stereo method, and a three-dimensional point group D154c in addition to the image D154e.

In addition, the depth map said here refers to the image which hold | maintains the value with the distance (depth) to measurement object with respect to each pixel which comprises the image 154c. Usually, the value correlated with the distance to the measurement object is an integer value that can be configured as a normal image, and by multiplying a predetermined coefficient determined from the focal distance, the physical distance to the object (for example, millimeter) Can be converted to The focal length is included in the unique information of the imaging unit 110 as described above.

Also, the three-dimensional point group D154 will be described. Each axis (X) from the origin (optical center of the imaging unit) in the orthogonal coordinate system in the three-dimensional space separately set with respect to the physical distance to the measurement object converted from the depth map D 154 d as described above , Y, Z) are a set of coordinates.

The imaging unit 110 can obtain a pair of images D154a and D154b with different viewpoints by a single imaging device D150, so the configuration is more compact unlike the conventional stereo method that requires two or more imaging units. Makes it possible to realize three-dimensional measurement.

The imaging unit D110 further includes an autofocus mechanism that controls the focal length of the optical system and a zoom mechanism that controls the angle of view. The auto focus mechanism can be switched on or off, and the set focal length can be fixed. The imaging unit D110 reads a control value defined by a drive amount such as a rotation angle or movement amount of an optical system control motor provided to control a focus and an angle of view, and refers to a lookup table (not shown). The distance can be calculated and output. Further, the imaging unit D110 can read, from the mounted lens, unique information of the lens such as a focal length range, an aperture, a distortion coefficient, and an optical center. The read inherent information is used for correction of lens distortion of a parallax image D 154 f and a depth map D 154 d described later, and calculation of a three-dimensional point group D 154 c.

The imaging unit 110 corrects the lens distortion of the images D154a to D154b and the parallax image D154f, the depth map D154d, the image coordinates of the principal point position (hereinafter referred to as the image center), and the base lengths of the images D154a and D154b. It has a function to output. It also has a function to output three-dimensional measurement data such as generated images 154a to 154c, optical system data such as focal length and image center, parallax image D154f, base line length, depth map D154d, and three-dimensional point group D154c. ing. In the present embodiment, these data are collectively referred to as image information (hereinafter also referred to as "visual information"). The imaging unit 110 selectively outputs all or part of the image information in accordance with a parameter set in a storage area (not shown) provided inside the imaging unit 110 or an instruction given from the outside of the imaging unit 110.

The movement control in the present embodiment is to control a motor that is an actuator included in the moving body and a steering that changes the direction of the wheel. By controlling these, the mobile unit is moved to a predetermined destination. Also, the control value is a command value for controlling the moving body.

The position and orientation of the imaging unit in the present embodiment are six parameters including three parameters indicating the position of the imaging unit 110 in an arbitrary world coordinate system defined in the real space and three parameters indicating the orientation of the imaging unit 110. It is Note that the mounting position of the imaging device with respect to the center of gravity of the moving object is measured at the design stage of the moving object such as AGV, and a matrix representing the mounting position and orientation is stored in the external memory H14. The gravity center position of the AGV can be calculated by multiplying the position and orientation of the imaging unit by the matrix representing the attachment position and orientation described above. For this reason, in the present embodiment, the position and orientation of the imaging unit are treated as synonymous with the position and orientation of the AGV. A three-dimensional coordinate system defined on the imaging unit with the optical axis of the imaging unit 110 as the Z axis, the horizontal direction of the image as the X axis, and the vertical direction as the Y axis is called an imaging unit coordinate system.

The input unit 1110 inputs, in a time series (for example, 60 frames per second), a depth map in which a depth value is stored for each pixel of an image of a scene as image information (visual information) acquired by the imaging unit 110 Output to 1120. The depth value is the distance between the imaging unit 110 and an object in the scene.

The calculation unit 1120 calculates and acquires the position and orientation of the imaging unit using the depth map input by the input unit 1110 and map information serving as an index of position and orientation calculation held by the holding unit 1130. The map information will be described later. The calculation unit 1120 further outputs the calculated position and orientation to the control unit 1140. The calculation unit may obtain information necessary for outputting the position and orientation from the input unit and may simply compare the information with the map information held by the holding unit 1130.

The holding unit 1130 holds a point cloud as map information. The point cloud is three-dimensional point cloud data of a scene. In this embodiment, the point cloud is held by the holding unit 1130 as a data list storing three values of three-dimensional coordinates (X, Y, Z) in an arbitrary world coordinate system. Three-dimensional point cloud data indicates three-dimensional position information. Also, in addition to these, the three-dimensional coordinates that are the destination of the AGV and the target position and posture representing the posture are held. The target position and orientation may be one or more, but for the sake of simplicity, an example in which the target position and orientation is one point will be described. Also, the holding unit 1130 outputs map information to the calculation unit 1120 as needed. Furthermore, the target position and orientation are output to the control unit 1140.

The control unit 1140 calculates a control value for controlling the AGV based on the position and orientation of the imaging unit 110 calculated by the calculation unit 1120, the map information held by the holding unit 1130, and the operation information input by the communication device (not shown). Do. The calculated control value is output to the actuator 120.

FIG. 6 is a diagram showing a hardware configuration of the information processing apparatus 1. A CPU H11 controls various devices connected to the system bus H21. H12 is a ROM, which stores a BIOS program and a boot program. H13 is a RAM, which is used as a main storage device of the CPU H11. An external memory H14 stores a program processed by the information processing apparatus 1. The input unit H15 is a keyboard, a mouse, or a robot controller, and performs processing related to input of information and the like. The display unit H16 outputs the calculation result of the information processing device 1 to the display device according to the instruction from H11. The display device may be of any type such as a liquid crystal display device, a projector, or an LED indicator. Further, the display unit H16 included in the information processing apparatus may play a role as a display device. A communication interface H17 performs information communication via a network. The communication interface may be Ethernet (registered trademark), and may be of any type such as USB, serial communication, or wireless communication. Information is exchanged with the mobile object management system 13 described above via the communication interface H17. H18 is I / O, and inputs image information (visual information) from the imaging device H19. The imaging device H19 is the imaging unit 110 described above. H20 is the actuator 120 described above.

Next, the processing procedure in the present embodiment will be described. FIG. 5 is a flowchart showing the processing procedure of the information processing apparatus 10 in the present embodiment. Hereinafter, the flowchart is realized by the CPU executing the control program. The processing steps include initialization S110, visual information acquisition S120, visual information input S130, position and orientation calculation S140, control value calculation S150, control of AGV S160, and system termination determination S170.

In step S110, the system is initialized. That is, the program is read from the external memory H14, and the information processing apparatus 10 is made operable. In addition, the parameters (internal parameters and focal length of the imaging unit 110) of each device connected to the information processing apparatus 10, and the initial position and orientation of the imaging unit 110 are read as a previous time position and orientation in H13 which is a RAM. In addition, each device of AGV is started, and it is put in the state where it can operate and control. In addition to these, the operation information is received from the mobile management system through the communication I / F (H17), the three-dimensional coordinates of the destination to which the AGV should head is received, and held in the holding unit 1130.

In step S120, the imaging unit 110 acquires visual information and inputs the visual information to the input unit 1110. In the present embodiment, visual information is a depth map, and it is assumed that the imaging unit 110 has acquired the depth map by the method described above. That is, the depth map is D154 d in FIG.

In step S130, the input unit 1110 acquires the depth map acquired by the imaging unit 110. In the present embodiment, the depth map is a two-dimensional array list storing the depth value of each pixel.

In step S140, the calculating unit 1120 calculates the position and orientation of the imaging unit 110 using the depth map input by the input unit 1110 and the map information held by the holding unit 1130. Specifically, first, a three-dimensional point group defined in the imaging coordinate system is calculated from the depth map. A three-dimensional point group (X _t , Y) using image coordinates (u _t , v _t ), internal parameters (f _x , f _y , c _x , c _y ) of the imaging unit 110 and depth values D of pixels of the depth map _t, the _{Z t)} is calculated by equation 1.

Next, using the previous time position and orientation of the imaging unit 110, coordinate conversion of the three-dimensional point group is performed to the previous time position and orientation coordinate system. That is, the three-dimensional point group is multiplied by the matrix of the previous time position and orientation. The position and orientation are calculated such that the sum of the distances between the nearest three-dimensional points of the calculated three-dimensional point group and the point cloud of the map information held by the holding unit 1130 is reduced. Specifically, the position and orientation of the imaging unit 110 with respect to the previous time position and orientation are calculated using an ICP (Iterative Closest Point) algorithm. Finally, it is converted into the world coordinate system, and the position and orientation in the world coordinate system are output to the control unit 1140. The calculated position and orientation are stored over the H13, which is the RAM, as the previous time position and orientation.

In step S150, the control unit 1140 calculates a control value for controlling the AGV. Specifically, the control value is calculated such that the Euclidean distance between the destination coordinates held by the holding unit 1130 and the position and orientation of the imaging unit 110 calculated by the calculation unit 1120 is reduced. The control value calculated by the control unit 1140 is output to the actuator 120.

In step S160, the actuator 120 controls the AGV using the control value calculated by the control unit 1140.

In step S170, it is determined whether to end the system. Specifically, if the Euclidean distance between the destination coordinates held by the holding unit 1130 and the position and orientation of the imaging unit 110 calculated by the calculation unit 1120 is equal to or less than a predetermined threshold, the processing ends as having arrived at the destination. If not, the process returns to step S120 and continues processing.

In the first embodiment, each of the light receiving units on the image pickup device includes two or more light receiving elements, and a three-dimensional point obtained from the depth map acquired by the imaging unit and the point cloud as map information Use three-dimensional points. The position and orientation of the imaging unit are calculated so as to minimize the distance between those three-dimensional points. By automatically controlling the AGV so as to minimize the distance between the calculated position and orientation of the imaging unit and the destination of the AGV, the AGV can be operated stably and with less effort.

<Modification>
In the first embodiment, the imaging unit 110 calculates the depth map D 154 d, and the input unit 1110 in the information processing apparatus inputs the depth map. As a modification, as long as the position and orientation of the imaging unit 110 can be calculated, what the input unit 1110 inputs is not limited to the depth map calculated by the imaging unit 110. Specifically, if the imaging unit 110 internally calculates a point cloud in the imaging unit 110 coordinate system, the input unit 1110 can input the point cloud calculated by the imaging unit 110. At this time, the calculation unit 1120 can perform position and orientation calculation using the point cloud input by the input unit 1110. The point cloud calculated by the imaging unit 110 is the three-dimensional point group D 154 in FIG. 4. Further, even if the input unit 1110 inputs the pair of images D154a and D154b acquired by the imaging unit 110 and the focal length held by the imaging unit 110, the calculation unit 1120 obtains the depth map by the corresponding point search and the stereo method. Good. Further, in addition to them, the input unit 1110 may also input an image that is an RGB image or a gray image acquired by the imaging unit 110 as visual information. That is, the calculation unit 1120 may perform the depth map calculation performed by the imaging unit 110 instead.

The imaging unit 110 may further include a focus control mechanism that controls the focal length of the optical system, and the information processing apparatus may control the focus control. For example, the control unit 1140 of the information processing apparatus may calculate a control value (focus value) for adjusting the focus. For example, when the moving object moves and the appearance of the visual image changes, a control value for adjusting the focus of the imaging unit 110 in accordance with the depth of the average value or the median value of the depth map is calculated. In addition, the present information processing apparatus can adjust the autofocus mechanism formed inside the imaging unit 110 instead of adjusting the focus. By adjusting the focus, it is possible to obtain more focused visual information, and it is possible to calculate the position and orientation with high accuracy. The imaging unit 110 may have a configuration (focus fixed) without the focus control function. In this case, the imaging unit 110 can be downsized because it is not necessary to mount the focus control mechanism.

The imaging unit 110 may further include a zoom control mechanism that controls the zoom of the optical system, and the information processing apparatus may perform this zoom control. Specifically, when the moving object moves at high speed, the control unit 1140 calculates a control value (adjustment value) for adjusting the zoom value so that the zoom is wide angle and the visual information combination of the wide field of view is acquired. . In addition, when it is desired to control the moving object with high accuracy and to calculate the position and orientation of the imaging unit 110 with high accuracy, the zoom value is set so that the zoom is narrow and the visual information combination of the narrow field of view is acquired with high resolution. The control value (adjustment value) to adjust is calculated. Thus, by changing the zoom value as necessary, the position and orientation of the imaging unit 110 can be stably calculated with high accuracy. Therefore, it is possible to control the moving body stably and with high accuracy.

In the present embodiment, although the imaging unit 110 has been described on the assumption that the optical system is applicable to the pinhole camera model, the optical system can acquire the position and orientation of the imaging unit 110 and visual information for controlling the moving object. Any optical device (lens) may be used as long as it is a system. Specifically, it may be a full sky lens, a fisheye lens, or a hyperboloid mirror. A macro lens may be used. For example, if an all-sky lens or a fisheye lens is used, it is possible to acquire a depth value of a wide field of view, and the robustness of position and orientation estimation is improved. A detailed position and orientation can be calculated by using a macro lens. As described above, the user can freely change (exchange and the like) the lens in accordance with the scene to be used, and the position and orientation of the imaging unit 110 can be stably calculated with high accuracy. In addition, the moving body can be controlled stably and with high accuracy.

When the zoom value and the focal length are changed as described above, a control value defined by the rotation angle or movement amount of the optical system control motor provided for controlling the focus and the angle of view of the imaging unit 110 Read Then, the focal length is calculated with reference to a lookup table (not shown). When the lens is changed, the imaging unit 110 reads the focal length value recorded in the lens through the electronic contact given to the lens. A person can also input a focal length to the imaging unit 110 using a UI (not shown). The imaging unit 110 calculates the depth map using the focal length value acquired in this manner. Then, the input unit 110 of the information processing apparatus inputs a focal length value from the imaging unit 110 together with visual information. The calculation unit 1120 calculates the position and orientation using the depth map input by the input unit 1110 and the focal length value. The imaging unit 110 can also calculate a point cloud in the imaging unit 110 coordinate system using the calculated focal length. At this time, the input unit 110 of the information processing apparatus inputs the point cloud calculated by the imaging unit 110, and the calculation unit 1120 calculates the position and orientation using the point cloud input by the input unit 110.

The map information in the present embodiment is a point cloud. However, any information may be used as long as it is an index for calculating the position and orientation of the imaging unit 110. Specifically, it may be a point cloud with color information in which three values, which are color information, are added to each point of the point cloud. In addition, the depth map may be associated with the position and orientation to form a key frame, and a plurality of key frames may be held. At this time, the position and orientation are calculated so as to minimize the distance between the depth map of the key frame and the depth map acquired by the imaging unit 110. Furthermore, if the input unit 1110 is configured to input an image, the calculation unit 1120 may store the input image in association with the key frame. Furthermore, a configuration may be adopted in which a 2D map in which an area through which the AGV can pass and an impassable place such as a wall are associated is held. The usage of the 2D map will be described later.

Although the position and orientation calculation in the present embodiment is described using an ICP algorithm, any method may be used as long as the position and orientation can be calculated. That is, instead of the point cloud described in the present embodiment, the calculation unit 1120 may calculate a mesh model from them and may calculate the position and orientation so as to minimize the distance between the surfaces. Alternatively, a three-dimensional edge which is a discontinuous point may be calculated from the depth map and the point cloud, and a position and orientation may be calculated so that the distance between the three-dimensional edges is minimized. In addition, if the input unit 1110 is configured to input an image, the calculation unit 1120 can also calculate the position and orientation by further using the input image.

Further, when the AGV includes a sensor such as an inertial sensor such as a gyro or an IMU, or an encoder for acquiring the amount of rotation of a tire, the input unit 1110 inputs a sensor value of the input sensor. The calculating unit 1120 can also calculate the position and orientation of the imaging unit 110 by using the sensor values. Specifically, related techniques are known as Kalman Filter and Visual Inertial SLAM, and these can be used. As described above, by using the visual information and the sensor information of the imaging unit 110 in combination, the position and orientation can be calculated robustly with high accuracy. In addition, an inertial sensor such as a gyro or an IMU can be used to reduce blurring of visual information captured by the imaging unit 110. Specifically, when movement or rotation in the vertical direction is detected, it is regarded as vibration of the AGV, and image information of the visual information is deformed so as to cancel this. By doing this, it is possible to calculate the position and orientation with high accuracy without being affected by the shake during traveling of the AGV.

In the first embodiment, the control unit 1140 calculates the control value so that the distance between the target position and orientation and the position and orientation calculated by the calculation unit 1120 is reduced. In addition, as long as the control unit 1140 calculates a control value for reaching the destination, it may calculate or use any control value. Specifically, when the depth value of the depth map, which is the input geometric information, is less than a predetermined distance, the control unit 1140 calculates a control value such as turning to the right, for example. In addition, the calculation unit 1120 generates a route by the dynamic programming method, with the map information held by the holding unit 1130 not being able to pass through the part where the point cloud exists and not passing space, the control unit 1140 follows this route Control values can also be calculated. By doing this, it is possible to perform an action along the wall, and to move to the destination while avoiding a collision with the wall. In addition, the calculation unit 1120 previously projects a point cloud that is map information on a plane that is the ground to create a 2D map. The point where the point cloud is projected is an impassable point such as a wall or an obstacle, and the unprojected point is an impassable point without passing through the space. Based on this information, dynamic programming can generate a route to a destination. The calculation unit 1120 calculates a cost map that stores values that decrease as it approaches the destination, and the control unit 1140 receives this as input to output a control value. The controller may be used to calculate the control value. By calculating the control value that moves while avoiding the obstacle such as the wall in this manner, the AGV can be operated stably and safely.

The holding unit 1130 may not hold map information. Specifically, based on the visual information acquired by the imaging unit 110 at time t and t ′ ′ one time before that, the calculation unit 1120 calculates the position and orientation of the time t relative to the time t ′ ′. The position and orientation of the imaging unit 110 can be calculated without the map information by multiplying the position and orientation change amount matrix calculated by the calculation unit 1120 every time as described above. With such a configuration, it is possible to calculate the position and orientation even in a computer with a small computational resource and to control the moving body.

In the first embodiment, the holding unit 1130 holds the map information created in advance. However, a configuration of SLAM (Simultaneous Localization and Mapping) may be performed in which position and orientation estimation is performed while creating map information based on the visual information acquired by the imaging unit 110 and the position and orientation calculated by the calculation unit 1120. Many methods of SLAM have been proposed and can be used. For example, it is possible to use a Point-Based Fusion algorithm that integrates point clouds acquired by the imaging unit 110 at multiple times in time series. In addition, it is possible to use a Kinect Fusion algorithm that integrates the boundary between the measured depth object and the space as voxel data in time series. Besides this, an RGB-D SLAM algorithm or the like that generates a map while tracking the depth of a feature point detected from an image as a depth value of a depth sensor is known, and these can be used. Further, in the present embodiment, the maps are not limited to those generated in the same time zone. For example, time zones may be changed to generate multiple maps, and these may be synthesized.

In the present embodiment, the map information is not limited to generation from data acquired by the imaging unit 110 mounted on the mobile object 11. For example, the holding unit 1130 may hold and hold a CAD drawing or map image of the environment as it is or after converting the data format. Alternatively, the holding unit 1130 may hold a map based on a CAD drawing or a map image as an initial map, and update the map using the above-described SLAM technology. The control unit 1140 may calculate the control value for controlling the AGV so as to update the map at a point where a predetermined time has passed, while holding the map update time. The map may be updated by overwriting, or the initial map may be held and the difference may be stored as update information. At this time, the map can be managed in layers and checked on the display unit H16 or can be returned to the initial map. Convenience is improved by performing the operation while looking at the display screen.

In the first embodiment, the moving object is operating based on the destination coordinates set by the moving object management system 13. On the other hand, the position / posture and control value calculated by the information processing apparatus can be transmitted to the mobile management system through the communication I / F (H17). The mobile object management system 13 and the process management system 14 refer to positions and orientations and control values calculated based on the visual information acquired by the imaging unit 110 to perform process management and mobile object management more efficiently. it can. In addition, if destination coordinates are always obtained from the movement management system online, the holding unit 1130 can be configured to receive any time via the communication I / F without holding the destination coordinates.

In the first embodiment, as the information processing system 1, the process management system 14 manages the entire process of the factory, the mobile management system 13 manages the operation information of the mobile according to the management status, and the mobile 12 according to the operation information. Was configured to move. However, any configuration may be employed as long as the moving body moves based on the visual information acquired by the imaging unit 110. For example, the process management system and the mobile management system may be omitted if the configuration is such that predetermined two points are held in advance in the holding unit 1130 and the space between them is exchanged.

In the present embodiment, the moving body 12 is not limited to the carrier vehicle (AGV). For example, the mobile unit 12 may be an autonomous vehicle or an autonomous mobile robot, and the movement control described in the present embodiment may be applied to them.

In particular, if the above-described information processing apparatus is mounted on a car, it can also be used as a car that realizes automatic driving. The vehicle is moved using the control value calculated by the control unit 1140. In this case, it is possible to acquire destination coordinates and map information from the car navigation system mounted on a car through the I / O (H18).

Further, the position and orientation may be calculated based on the visual information acquired by the imaging unit 110 instead of controlling the moving body. Specifically, the method of the present embodiment is used to align the real space and the virtual object in the mixed reality system, that is, to measure the position and orientation of the imaging unit 110 in the real space for use in drawing the virtual object. It can also be applied. Here, as an example, an example will be described in which a 3DCG model is aligned and synthesized on the image D 154 a captured by the imaging unit 110 on the display of a mobile terminal represented by a smartphone or a tablet. In order to realize such an application, the input unit 1120 inputs an image D154a in addition to the depth map D152c acquired by the imaging device 1110. The holding unit 1130 further holds the 3DCG model of the virtual object and the three-dimensional position at which the 3DCG model is installed in the map coordinate system. The calculation unit 1120 combines the 3DCG model with the image D 154 a using the position and orientation of the imaging device 1110 calculated as described in the first embodiment. By doing this, the user who experiences mixed reality holds the mobile terminal and stabilizes the real space on which the virtual object is superimposed based on the position and orientation calculated by the information processing apparatus through the display of the mobile terminal. It can be observed.

Second Embodiment
In the first embodiment, the position and orientation of the imaging unit are calculated using the depth map acquired by the imaging unit. An imaging unit based on dual pixel auto focus (DAF) can measure a specific distance range from the imaging unit with high accuracy. Therefore, in the second embodiment, even if the distance from the imaging unit is out of a specific range, the depth value is calculated by motion stereo to further increase the accuracy of the depth map acquired by the imaging unit and stabilize the position and orientation. Calculation with high accuracy.

The configuration of the device in the second embodiment is the same as that of FIG. 2 showing the configuration of the information processing device 10 described in the first embodiment, and thus the description thereof is omitted. The input unit 1110 inputs visual information to the holding unit 1130, and the holding unit 1130 holds the visual information, which is different from the first embodiment. The second embodiment differs from the first embodiment in that the calculating unit 1120 corrects the depth map using the visual information held by the holding unit 1130 and calculates the position and orientation. Further, it is assumed that the holding unit 1130 holds a list in which the reliability of the depth value of the depth map acquired by the imaging unit 110 is associated in advance as the characteristic information of the imaging unit 110. The reliability of the depth value is a value obtained by clipping the reciprocal of the error between the actual distance and the measured distance when taking a picture of the imaging unit 110 and the flat panel at a predetermined distance in advance, from 0 to 1 It is. It is assumed that the reliability has been calculated in advance for various distances. However, the point where the measurement could not be made has a reliability of 0. In the present embodiment, visual information acquired by the imaging unit 110 and input by the input unit 1110 is an image and a depth map.

The procedure of the entire processing in the second embodiment is the same as that in FIG. 4 showing the processing procedure of the information processing apparatus 10 described in the first embodiment, and thus the description will be omitted. The second embodiment differs from the first embodiment in that the depth map correction step is added before the position and orientation calculation step S140. FIG. 7 is a flowchart showing details of the processing procedure in the depth map correction step.

In step S2110, the calculation unit 1120 reads the characteristic information of the imaging unit 110 from the holding unit 1130.

In step S2120, calculation unit 1120 uses the image, which is visual information acquired at arbitrary time t ′ before time t at which imaging unit 110 acquired the visual image, held by holding unit 1130, and the depth map input image. Calculate the depth value by motion stereo. Hereinafter, an image which is visual information acquired at an arbitrary time t ‘before time t will also be described as a past image. Hereinafter, the depth map acquired at any time t 以前 before time t will also be referred to as a past depth map. Motion stereo is a known technique and various methods can be applied. In addition, although the ambiguity of the scale of depth value remains in the motion stereo from two images, about this, it can calculate based on the ratio with the past depth map and the depth value calculated by motion stereo.

In step S2130, the calculation unit 1120 updates the depth map with a weighted sum using the reliability associated with the depth value, which is the characteristic information read in step S2110, and the depth value calculated by motion stereo in step S2120. Specifically, assuming that the value of the reliability in the vicinity of each depth value d of the depth map is a weight α, the weighted sum of Expression 2 is used to correct the depth value m calculated by motion stereo.

d _new = αd + (1−α) m (Equation 2)

To update the depth map using the calculated d _new. When updating of all the pixels in the depth map is completed, the depth map correction step is ended, and the processing after step S150 described in the first embodiment is continued.

As described above, in the second embodiment, when the imaging unit 110 can acquire the depth value with high accuracy, the weight of the depth value acquired by the imaging unit 110 is large, and otherwise the depth value calculated by motion stereo Increase the weight of As a result, even if the measurement accuracy of the imaging unit 110 is lowered, it can be corrected by motion stereo to calculate the depth map with high accuracy.

<Modification>
In the present embodiment, the reliability in correction of the depth map is calculated from the measurement error of the depth value of the depth map calculated by the imaging unit 110, and is used as the weight α. However, any method may be used as long as the depth map acquired by the imaging unit 110 and the depth value calculated by motion stereo are integrated to calculate a weight value that enhances the depth map. For example, a value obtained by integrating a predetermined coefficient β to the reciprocal of the depth of the depth map may be used as the weight. Alternatively, the gradient of the input image may be calculated, and the inner product of the gradient direction and the arrangement direction of the elements in the imaging unit 110 may be used as a weight. In addition, the input unit 1110 further receives the baseline lengths of the two images D154a and D154b in the imaging unit 110 and the baseline lengths of the parallax image D154f from the imaging unit 110, and the ratio of the baseline length in motion stereo as a weight It can also be used. In addition, as described in the present embodiment, instead of calculating weights for each pixel, a method of integrating only specific pixels, or calculation of weighted sum by applying the same weight to some or all pixels is calculated. You may In addition, motion stereo may be performed using images and depth maps at a plurality of times in addition to a certain past one time.

Furthermore, the AGV can also be controlled so that visual information can be obtained so that the position and orientation can be calculated more accurately and robustly. For example, the control unit 1140 calculates the control value so as to move the AGV so that the baseline length of the motion stereo becomes large. Specifically, a control value in which the vehicle travels in a serpentine manner while capturing a predetermined distant point by the imaging unit 110 is an example. As a result, since the base length at the time of motion stereo becomes long, it is possible to accurately calculate the depth value further away. The control unit 1140 can also calculate the control value so that the imaging unit 110 obtains visual information in a wider field of view. Specifically, the control value is calculated so as to perform a look around operation centered on the optical center of the imaging unit 110. As a result, visual information with a wider field of view can be acquired, so that divergence and error in optimization can be reduced and position and orientation can be calculated.

The input unit 1110 receives an image and position and orientation from another AGV through the communication I / F, and calculates the depth value by performing motion stereo using the received image and position and orientation and the image acquired by the imaging unit 110. It can also be done. Moreover, what is received may be anything as long as it is visual information, and may be a depth map, parallax image, or three-dimensional point group acquired by an imaging unit of another AGV.

Third Embodiment
In the first and second embodiments, the position and orientation and the control value are calculated based on visual information obtained by photographing the scene acquired by the imaging unit 110. However, depth accuracy may be reduced in walls and columns without texture. Therefore, in the third embodiment, depth accuracy is improved by projecting predetermined pattern light onto a scene and the imaging unit 110 acquiring the pattern light.

The configuration of the information processing apparatus 30 in the present embodiment is shown in FIG. The difference is that the control unit 1140 in the information processing apparatus 10 described in the first embodiment further calculates a control value of the projection apparatus 310 and outputs the calculated control value. In addition, the projection apparatus in this embodiment is a projector, and it is attached so that the optical axis of the imaging part 110 and the optical axis of a projection apparatus may correspond. Also, the pattern projected by the projection device 310 is a random pattern generated so that projected and non-projected regions exist at random. In the present embodiment, the visual information is the image D 154 e and the depth map D 154 d acquired by the imaging unit 110, and the input unit 1110 inputs the image from the imaging unit 110.

The diagram of the processing procedure in the present embodiment is the same as FIG. 5 for explaining the processing procedure of the information processing apparatus 10 described in the first embodiment, and therefore the description thereof is omitted. In step S150, the calculation unit 1120 calculates the texture degree value indicating whether the input visual information is poor in texture, and the control unit 1140 controls the pattern projection ON / OFF based on the texture degree value. The point to calculate differs from the first embodiment.

Details of the calculation procedure of the control value of the pattern projection calculated by the control unit 1140 in step S150 will be described. First, the calculation unit 1120 convolutes the Sobel filter with the input image and further calculates their absolute values to calculate a gradient image. The Sobel filter is a type of filter for calculating the first derivative of an image and is known in various documents. The ratio of pixels equal to or greater than a predetermined gradient value threshold in the calculated gradient image is taken as the texture degree. Next, the control unit 1140 calculates a control value so as to turn on the projection device if the value of the texture degree is equal to or more than a predetermined threshold, and turn off the projection device if the value is less than the predetermined threshold.

As described above, in the third embodiment, when the scene is poor in texture, random pattern light is projected. As a result, a random pattern is added to the scene, so that even if the scene is poor in texture, the imaging unit can acquire the depth map more accurately. Therefore, the position and orientation can be calculated with high accuracy.

<Modification>
In the present embodiment, the pattern light is a random pattern. However, any pattern may be used as long as it gives a texture to an area poor in texture. For example, a random dot pattern or a fringe pattern (such as a restriction or a lattice pattern) may be projected. In the stripe pattern, there is an ambiguity that it is not possible to distinguish the distance between the inside and outside of the modulation wavelength, but by using a gray code method to obtain depth values from input images acquired at multiple times with different frequencies. It can be eliminated.

In the present embodiment, the control unit 1140 outputs the control value of ON / OFF of the projection, and the control device 310 switches the presence or absence of the projection. However, the configuration is not limited to this as long as the projection device 310 can project pattern light. For example, the projector 310 may be configured to start projection when the power is turned on in the initialization step S110. Also, the projection device 310 may be configured to project an arbitrary part of the scene. Specifically, the control unit 1140 can also switch the projection pattern of the projection device 310 so that the projection device 310 projects only in a region where the gradient value of the gradient image is less than a predetermined threshold. In the object detection described in the fifth embodiment, it is also possible to detect a human eye and calculate a control value so as to project a pattern while avoiding the human eye. Furthermore, the brightness may be changed as well as ON / OFF of the pattern. That is, the control unit 1140 can calculate the control value so that the projection device 310 projects the area brighter in the depth map more brightly, or the control value such that the dark part of the input image projects more brightly Can also be calculated. In addition, the pattern may be changed as long as the residual of the error in the iterative calculation when the calculation unit 1120 calculates the position and orientation is equal to or more than a predetermined threshold.

In the present embodiment, the texture degree value uses the gradient image by the Sobel filter. Besides, it is also possible to calculate the texture degree value using a gradient image or an edge image calculated by a filter such as a pre-fit filter, a SCHARR filter, or a Canny filter that performs edge detection. Alternatively, a high frequency component obtained by applying DFT (discrete Fourier transform) to an image may be used as the texture degree value. Alternatively, feature points such as corners in an image may be calculated, and the number of feature points may be used as the texture degree.

Fourth Embodiment
In the first and second embodiments, the position and orientation and the control value are calculated based on visual information obtained by photographing the scene acquired by the imaging unit. In the third embodiment, the pattern light is projected to improve the accuracy for a scene with poor texture. In the fourth embodiment, a method will be described in which three-dimensional information representing the position of a scene measured by another three-dimensional sensor is additionally used.

The configuration of the information processing apparatus 40 in the present embodiment is shown in FIG. This embodiment differs from the first embodiment in that the input unit 1110 in the information processing apparatus 10 described in the first embodiment further inputs three-dimensional information from the three-dimensional measurement apparatus 410. Note that the three-dimensional measurement device 410 in the present embodiment is a 3D LiDAR (light detection and ranging), which is a device that measures the distance based on the round trip time of the laser pulse. The input unit 1110 inputs a measurement value acquired by the three-dimensional device as a point cloud. Further, the holding unit is a list in which the reliability of the depth value of the depth map acquired by the imaging unit 110 is associated in advance and the list in which the reliability of the depth value of the three-dimensional measuring apparatus 410 is associated. It shall be held. It is assumed that these reliabilities are calculated in advance by both the imaging unit 110 and the three-dimensional measurement apparatus 410 by the method described in the second embodiment.

The procedure of the entire processing in the fourth embodiment is the same as that of FIG. 4 showing the processing procedure of the information processing apparatus 10 described in the first embodiment, and thus the description will be omitted. The second embodiment differs from the first embodiment in that a depth map correction step is added before the position and orientation calculation step S140. FIG. 10 is a flowchart showing details of the processing procedure in the depth map correction step.

In step S4110, the calculation unit 1120 reads the characteristic information of the imaging unit 110 and the three-dimensional measurement apparatus 410 from the holding unit 1130.

In step S4120, the calculation unit 1120 calculates the depth map calculated by the imaging unit 110 using the reliability associated with the depth value, which is the characteristic information read in step S4110, and the point cloud measured by the three-dimensional measurement device 410. Integrate. Specifically, the depth map can be updated by replacing the value m in Equation 2 with the depth value measured by the three-dimensional measurement device 410. The weight α is calculated by Equation 3, assuming that the reliability of the depth map is γ _D and the reliability of a point cloud pointing to the same point is γ _L.

The depth map is updated by equation 2 using the calculated weights. When updating of all pixels in the depth map is completed, the depth map correction step is ended, and the processing after step 150 described in the first embodiment is continued.

As described above, in the fourth embodiment, when the imaging unit can acquire the depth value with high accuracy, the weight of the depth value acquired by the imaging unit is large, and the three-dimensional measuring apparatus can acquire the depth value with high accuracy. In the above, the weight of the depth value acquired by the three-dimensional measurement device is increased. As a result, the depth map can be calculated using the depth values that can be measured with higher accuracy by the imaging unit and the three-dimensional measurement apparatus, and the position and orientation can be calculated with high accuracy.

<Modification>
In the present embodiment, the method using 3D LiDAR as the three-dimensional measurement device 410 has been described. The three-dimensional measurement device 410 is not limited to this, as long as it can measure three-dimensional information that can increase the accuracy of visual information acquired by the imaging unit 110. For example, it may be a TOF (Time Of Flight) distance measurement camera, or may be a stereo camera provided with two cameras. In addition, a stereo configuration may be adopted in which the single-eye camera different from the imaging unit 110 by DAF is disposed in alignment with the optical axis of the imaging unit 110. An imaging unit 110 having different reliability characteristics may be further mounted, and this may be regarded as the three-dimensional measuring device 410 to similarly update the depth map.

Fifth Embodiment
In the first and second embodiments, the position and orientation and the control value are calculated based on visual information obtained by photographing the scene acquired by the imaging unit 110. In the third embodiment, predetermined pattern light is projected onto the scene. In the fourth embodiment, the three-dimensional shape measured by the three-dimensional measurement apparatus is used together. In the fifth embodiment, an object is detected from visual information and used to control a moving object. In the present embodiment, in particular, the AGV loads and carries a load, and when reaching the destination, the case where it must be strictly stopped at a predetermined position with respect to the shelf and the belt conveyor will be described. In the present embodiment, a method of controlling the AGV by calculating the exact position and orientation by calculating the position and orientation of an object such as a shelf or a belt conveyor imaged by the imaging unit 110 will be described. In the present embodiment, unless otherwise noted, the feature information of an object is the position and orientation of the object.

The configuration of the device according to the fifth embodiment is the same as that of FIG. The calculating unit 1120 further detects an object from visual information, and the control unit 1140 controls the moving body so that the detected object appears at a predetermined position in the visual information. Furthermore, the holding unit 1130 holds an object model for object detection, and holds a target position / posture with respect to the object as to what position / posture should be with respect to the object when the AGV arrives at the purpose. . The above points differ from the first embodiment.

An object model is a CAD model representing the shape of an object, and PPF (Point Pair Feature) feature information in which the relative position of a two-dimensional three-dimensional point group having a normal as a three-dimensional feature point of the object is a feature. Is a list that stores

The procedure of the entire processing in the fifth embodiment is the same as that of FIG. 5 showing the processing procedure of the information processing apparatus 10 described in the first embodiment, so the description will be omitted. However, the second embodiment differs from the first embodiment in that an object detection step is added after the position and orientation calculation step S140. FIG. 11 is a flowchart illustrating the details of the object detection step.

In step S5110, the calculation unit 1120 reads the object model held by the holding unit 1130.

In step S5120, the calculation unit 1120 detects where in the visual information an object that fits the object model is included in the depth map. Specifically, first, PPF features are calculated from the depth map. Then, by matching the PPF detected from the depth map with the PPF of the object model, the initial value of the object position / posture with respect to the imaging unit 110 is calculated.

In step S1530, with the position and orientation of the object with respect to the imaging unit 110 calculated by the calculating unit 1120 as the initial position, the position and orientation of the object with respect to the imaging unit 110 are accurately calculated by ICP algorithm. At the same time, the residual between the object and the target position and orientation held by the holding unit 1130 is calculated. The calculation unit 1120 inputs the calculated residual to the control unit 1140, and ends the object detection step.

In step S150 in FIG. 5, the control unit 1140 calculates the control value of the actuator 120 so that the AGV moves in the direction in which the residual of the position and orientation of the object calculated by the calculation unit 1120 decreases.

In the fifth embodiment, an image pickup unit characterized in that each light receiving unit on the image pickup device includes two or more light receiving elements detects an object shown in the acquired depth map, and the position and orientation of the object are detected by model fitting. calculate. Then, the AGV is controlled so that the difference between the position and orientation with respect to the object given in advance and the position and orientation of the detected object becomes small. That is, the AGV is controlled to be precisely aligned with the object. Thus, by calculating the position and orientation of an object whose shape is known in advance, the position and orientation can be calculated with high accuracy, and the AGV can be controlled with high accuracy.

<Modification>
In the present embodiment, the PPF feature is used to detect an object. However, any method capable of detecting an object may be used. For example, as a feature quantity, a SHOT feature may be used in which a histogram of an inner product of a normal to a three-dimensional point and a normal to a three-dimensional point located around the three-dimensional point is a feature. Alternatively, a feature using Spin Image in which surrounding three-dimensional points are projected on a cylindrical surface having a normal vector of a certain three-dimensional point as an axis may be used. Further, as a method of detecting an object without using a feature amount, a learning model by machine learning can also be used. Specifically, when a depth map is input, a neural network learned so that an object area is 1 and non-object areas are 0 can be used as a learning model. In addition, in the case of a learning model learned so as to output six degrees of freedom of the object from the depth map, the position and orientation of the object may be calculated by combining steps S5110 to S5130.

In this embodiment, a shelf or a belt conveyor is used as an example of the object. However, it may be any object as long as the imaging unit 110 can observe when the AGV is stopped and the relative position and orientation (relative position and relative orientation) are uniquely determined. For example, a three-dimensional marker (specifically, an arbitrary-shaped object having arbitrary asperities printed by a 3D printer) affixed to a ceiling of a factory as an index of position and orientation may be used. When the AGV is rechargeable and stops at the charging station, it may be a 3D CAD model of the shape of the charging station. Also, instead of using a CAD model, a depth map may be used as an object model when stopping at a target position and orientation in advance. At this time, during AGV operation, AGV may be controlled so that the position and orientation error between the held depth map and the depth map input by input unit 1110 is reduced. In this way, an object model can be generated without the trouble of creating a CAD model.

In the present embodiment, a method of detecting an object and performing model fitting for position and orientation calculation for exact positioning of the AGV is illustrated. However, it may be used not only for the purpose of exact position and orientation calculation but also for collision avoidance and position and orientation detection of other AGVs. Specifically, it is assumed that a CAD model of an AGV shape is held as an object model, and the calculation unit 1120 finds another AGV by object detection. At this time, the control unit 1140 can also be used to calculate control values so as to avoid coordinates of other AGVs. And avoid colliding with other AGVs. Also, when another AGV is detected, an alert may be presented, and another AGV may be instructed to clear its own traveling route. If the other AGVs are stopped, the control value is calculated to be close to them, considering that they are stopped due to battery exhaustion, the control unit 1140 calculates the control values so as to connect and move to the charging station. It is also good. In addition, when the wiring is made in the passage in the factory, the calculation unit 1120 may detect the wiring and calculate the control value so that the control unit 1140 bypasses them. If the ground is uneven, the control value may be calculated to avoid the unevenness. Further, if labels such as entry prohibition and recommended route are associated with each object model, it is possible to easily set whether or not the AGV can pass by arranging the object in the scene.

In the present embodiment, the object model is a CAD model. However, any model may be used as long as the position and orientation of the object can be calculated. For example, it may be a mesh model generated by three-dimensional reconstruction of a target object from stereo images taken at a plurality of viewpoints by the Structure From Motion algorithm. In addition, it may be a polygon model created by integrating depth maps captured from a plurality of viewpoints with an RGB-D sensor. Also, a CNN (Convolutional Neural Network) may be used as a neural network model learned to detect an object as described above.

The imaging unit 110 may image an object carried by the AGV, the calculation unit 1110 may recognize, and the control unit 1140 may calculate the control value according to the mounted object type. Specifically, if the mounted object is a broken object, the control value is calculated so that the AGV moves at a low speed. In addition, if a list in which the target position and posture are associated with each object is held in the holding unit 1130 in advance, the control value may be calculated so as to move the AGV to the target position related to the mounted object.

In addition, when the imaging unit 110 detects an object within a predetermined distance range from the moving object, it is possible to determine that the object to be transported is falling and display an alert. In addition, if a robot arm (not shown) is mounted on the AGV, the control unit 1140 may calculate the control value of the robot arm such that the robot arm acquires the object.

Sixth Embodiment
In the first to fourth embodiments, the method of calculating the control value of the moving object by stably calculating the position and orientation with high accuracy based on the visual information acquired by the imaging unit 110 has been described. The fifth embodiment has described the method of detecting an object from visual information and using it to control a moving object. In the sixth embodiment, as an additional function of the first to fifth embodiments, a method of stably performing control of AGV and generation of map information with high accuracy using the result of dividing input visual information into regions will be described. In particular, the present embodiment exemplifies a method of adapting upon generation of map information. In the map information, a static object whose position and orientation does not change even if time passes is registered, and using these to calculate the position and orientation improves the robustness to the change of the scene. Therefore, visual information is divided into semantic regions, and it is determined what kind of object each pixel is. Then, a method of generating hierarchical map information using stationary object likeness information calculated for each object type in advance, and a position and orientation estimation method using them will be described. In the present embodiment, the feature information of an object is the type of the object unless otherwise noted.

The configuration of the apparatus in the sixth embodiment is the same as that of FIG. 2 showing the configuration of the information processing apparatus 10 described in the first embodiment, and thus the description thereof is omitted. The calculating unit 1120 further divides visual information into semantic regions, and generates map information hierarchically using them. The hierarchical map information in this embodiment is a point cloud composed of four layers of (1) layout CAD model of a factory, (2) stationary object map, (3) fixture map, and (4) moving object map. . The holding unit 1130 holds (1) the layout CAD model of the factory in the external memory H14. Also, the position and orientation are calculated using hierarchically created map information. The position and orientation calculation method will be described later. In the present embodiment, visual information acquired by the imaging unit 110 and input by the input unit 1110 is an image and a depth map. The holding unit 1130 also holds a CNN, which is a learning model learned so as to output, for each object type, a mask image indicating whether each pixel is a corresponding object when an image is input. Along with the learning model, a look-up table is held for which each object type is in each of the layers (2) to (4), and when the object type is specified, it is known which layer the object type is.

The diagram of the procedure of the entire process in the sixth embodiment is the same as FIG. 4 showing the procedure of the information processing apparatus 10 described in the first embodiment, and therefore the description thereof is omitted. The second embodiment differs from the first embodiment in that the calculating unit 1120 calculates the position and orientation in consideration of the layer of the map information held by the holding unit 1130 when calculating the position and orientation. Further, the second embodiment differs from the first embodiment in that an area division / map generation step is added after the position and orientation calculation step S140. Details of these processes will be described later.

In step S140, the calculating unit 1120 calculates, for each layer of map information held by the holding unit 1130, a weight serving as a contribution degree of position and orientation calculation to the point cloud to calculate the position and orientation. Specifically, in the case of holding the layers (1) to (4) as in the example in the present embodiment, the weights are successively reduced from (1) to (4) which is less resistant map information.

FIG. 12 is a flowchart illustrating the details of the area division / map generation step. This area division / map generation step is added and executed immediately after the position and orientation calculation step S140 in FIG.

In step S6110, the calculation unit 1120 divides the input image into semantic regions. A number of approaches have been proposed for semantic domain segmentation, which can be incorporated. However, the method is not limited to the above method as long as the image is divided into semantic regions. These methods are used to obtain mask images in which each pixel is assigned with the object or not for each object type.

In step S6120, the depth map is divided into areas. Specifically, first, a normal is calculated for each pixel of the depth map, and an edge of the normal whose inner product with the surrounding normal is equal to or less than a predetermined value is detected. Then, the depth map is divided into areas by allocating different labels to the respective areas with the edge of the normal as a boundary, and an area divided image is obtained.

In step S6130, the calculation unit 1120 performs point cloud semantic area division based on the mask image obtained by dividing the input image into semantic areas and the area division image obtained by area division of the depth map. Specifically, the ratio N _{i, j} of the inclusion relationship between the area S _Dj of each depth map and the object area S _Mj of each mask is calculated by Expression 4. Here, i is an object type, and j is a label of area division of the depth map.

Next, the object type i is assigned to the area S _Dj of the depth map in which N _{i, j} is equal to or greater than a predetermined threshold. However, background labels are assigned to pixels to which no object type has been assigned. Thus, the object type i is assigned to each pixel of the depth map.

In step S6140, the calculation unit 1120 hierarchically generates map information based on the object type label assigned to the depth map in step S6130. Specifically, the lookup table is referred to for each object type label of the depth map, and the three-dimensional point group obtained from the depth map is stored in each layer of the map information held by the holding unit 1130. When the storage is completed, the area division / map generation step is ended.

As described above, in the sixth embodiment, by dividing the depth map into semantic regions, the non-moving object suitable for position and orientation calculation and the moving object unsuitable for position and orientation calculation are separately registered in the map information. be able to. Also, using the divided map information, weights are assigned such that moving objects become smaller. Then, the position and orientation are calculated according to the assigned weight. By doing this, the position and orientation can be calculated more stably and robustly.

<Modification>
In the present embodiment, the layers (1) to (4) are used. However, the configuration is sufficient as long as the configuration has a plurality of layers according to the movement of the object, and the configuration may be such that the holding unit 1130 holds only an arbitrary number of layers (1) to (4). In addition to the above, a specific object (AGV layer, human layer), pillar layer, and landmark (3D marker or charging station) layer may be held.

In the present embodiment, map information generation and position and orientation are calculated using the semantically segmented depth map. On the other hand, the control unit 1140 may calculate the control value using the depth map divided into semantic regions. Specifically, when a person or another AGV is detected when the semantic region is divided, the control unit 1140 can calculate the control value so as to avoid them. By doing this, AGV can be operated safely. Also, the control unit 1140 may calculate a control value that follows a person or another AGV. By doing this, the AGV can operate even without map information. Furthermore, the calculation unit 1120 may recognize a human gesture based on the semantic region division result, and the control unit 1140 may calculate the control value. Specifically, for example, the regions of the image are labeled by parts such as human arms and fingers, head, torso, and legs, and gestures are recognized based on their mutual positional relationship. If a person beckoning gesture is recognized, a control value is calculated so as to move closer to the person, and if a person pointing gesture is recognized, a control value is calculated so as to move in a pointing direction. As described above, by recognizing a human gesture, the user can directly move the AGV using a controller or the like without controlling the AGV, so that the AGV can be operated without any trouble.

The control unit 1140 may calculate the control value according to the object type detected by the method of the present embodiment. Specifically, control is performed so as to stop if the object type is a person, and to avoid if the object type is another AGV. This makes it possible to operate the AGV efficiently, avoiding non-human ones safely so as not to hit people who must avoid collisions.

In the present embodiment, the AGV passively segments the object. However, the AGV may instruct a person to exclude moving objects. Specifically, when the calculation unit 1120 detects a person during generation of map information, the control unit 1140 calculates a control value for outputting a voice for moving the person using a speaker (not shown). By doing this, it is possible to generate map information excluding moving objects.

In the present embodiment, semantic region division is performed to specify an object type. However, the configuration may be such that the calculation unit 1120 calculates map information, position and orientation, and the control unit 1140 calculates control values without specifying the object type. That is, S6110 and S6120 in FIG. 12 can be removed. Specifically, for example, the depth map is divided into areas by height from the ground. At this time, the control value is calculated ignoring pixels which are at a height higher than the height of the AGV. In other words, point clouds at heights at which AGVs do not collide are not used for route generation. By doing this, the number of point clouds to be processed decreases, and the control value can be calculated at high speed. Also, the area may be divided based on the planarity. In this way, it is possible to prioritize and use the three-dimensional edge with a high degree of contribution of position and orientation calculation (to exclude the plane in which the ambiguity remains in the position and orientation from the processing). It leads to the improvement of robustness.

In the present embodiment, the weight of the moving object in the map information is reduced to reduce the degree of contribution of the position and orientation calculation. On the other hand, even if the map information does not have a layer structure, it is possible to calculate a weight for each pixel of the depth map based on the result of semantic area division of the depth map, and calculate the position and orientation using this weight. After the input unit 1110 inputs the depth map, the calculation unit 1120 divides the depth map into semantic regions according to the processing procedure of S6110 to S6130. Then, the weight is determined by referring to the look-up table based on the object type label of each pixel. Thereafter, in step S140, the position and orientation are calculated in consideration of the weight. As described above, the influence of moving objects in position and orientation calculation can be reduced without making the map into a layer structure. This can reduce the capacity of the map.

The imaging unit 110 in the present embodiment is not limited to the imaging unit characterized in that each light receiving unit on the imaging device is constituted by two or more light receiving elements, three-dimensional depth information such as a TOF camera or 3DLiDAR Anything that can be acquired may be used.

In the present embodiment, the holding unit 1130 holds map information for each layer. These can be confirmed by the display unit H16 or returned to the initial map. By checking the layer while looking at the display screen, it is possible to operate the AGV easily and stably by instructing the AGV to generate the map again if the moving object is registered in the map. .

In the present embodiment, it is assumed that one AGV creates map information, but a plurality of AGVs can cooperate to generate map information. Specifically, the ICP algorithm aligns the point clouds pointing to the same point of the map information created by each AGV so as to be the same position. In addition, when integrating individually created map information, integration may be performed so as to leave newer map information by referring to the map creation time. The control unit 1140 may move an AGV that has not been worked on so as to generate a map of an area for which the map information has not been updated for a while. As described above, by emphasizing with a plurality of AGVs and generating a map, the time relating to the generation of map information becomes short, and AGVs can be easily operated.

The calculation of the control value calculated by the control unit 1140 is not limited to the method described in the present embodiment as long as it is a method of calculating so as to approach the target position and orientation using map information. Specifically, the control value can be determined using a learning model for route generation. For example, DQN (Deep Q-Network), which is a learning model of reinforcement learning, can be used. In particular, it can be realized by learning in advance a learning model of reinforcement learning so as to increase the reward when approaching the target position and posture, lower the reward when separating from the target position and posture, and lower the reward when approaching the obstacle.

In the first to sixth embodiments, the method of calculating the position and orientation and calculating the control value using the map information has been described. However, the purpose of use of map information is not limited to this. Specifically, using the generated map information, AGV transport simulation may be performed, and the process management system may generate processes so that the AGV can be transported efficiently. Similarly, the mobile management system may generate a route that avoids the AGV operation timing and congestion based on the map information.

The learning model described above may be learned along with the delivery simulation with the created map. At this time, the control unit 1140 stabilizes using a learning model even if a similar situation actually occurs, by reproducing and learning a situation such as installation of an obstacle or a collision with a person or another AGV on a simulation. The control value can then be calculated. Further, by learning in parallel by a method such as A3C (Asynchronous Advantage Actor-Critic), the learning model can be configured to learn the control method efficiently in a short time.

Seventh Embodiment
A UI that can be commonly applied to the first to sixth embodiments will be described. It will be described that the user confirms the visual information acquired by the imaging unit, the position / posture calculated by the calculation unit, the detection result of the object, the map information, and the like. In addition, the AGV is described to be controlled by the user's input because it moves by automatic control. As a display device, for example, a GUI is displayed on a display so that the user can control the AGV so that the user can confirm the status of the AGV, and an operation from the user is input using an input device such as a mouse or a touch panel. In the present embodiment, the display is mounted on the AGV, but the present invention is not limited to such a configuration. That is, it is also possible to use the liquid crystal display connected to the mobile management system as the display device using the display of the mobile terminal owned by the user as the display device through the communication I / F (H17). Even when using a display device mounted on the AGV or using a display device not mounted on the AGV, display information can be generated by the information processing apparatus. Moreover, when using the display apparatus which is not mounted in AGV, the computer which accompanies a display apparatus may acquire the information required for production | generation of display information from an information processing apparatus, and may produce | generate display information.

The configuration of the device according to the seventh embodiment is the same as that of FIG. The touch panel display in which the calculation unit 1120 generates display information based on the visual information acquired by the imaging unit 110, the position and orientation calculated by the calculation unit 1120, the detected object, and the control value calculated by the control unit 1140 And the like are different from the first embodiment. The details of the display information will be described later. Further, in the present embodiment, the holding unit 1130 holds 2D map information and 3D map information.

FIG. 13 shows a GUI 100 which is an example of display information presented by the display device according to the present embodiment.

G110 is a window for presenting 2D map information. G120 is a window for presenting 3D map information. G130 is a window for presenting the image D154e acquired by the imaging unit 110. G 140 is a window for presenting the depth map D 154 d acquired by the imaging unit 110. G150 represents the position and orientation calculated by the calculation unit 1120 as described in the first embodiment, the object detected as described in the fifth and sixth embodiments, and the G150 calculated by the control unit 1140 as described in the first embodiment. It is a window for presenting display information based on a control value.

G110 shows an example of presentation of a 2D map held by the holding unit 1130. G111 is an AGV on which the imaging unit 110 is mounted. The calculation unit 1120 synthesizes on the 2D map based on the position and orientation (position and orientation of the AGV) of the imaging unit. G112 is an example in which an alert is presented as a balloon when there is a possibility of a collision, based on the position and orientation of the object detected by the calculation unit 1120 according to the methods of the fifth and sixth embodiments. G113 is an example in which an AGV planned route is presented as an arrow based on the control value calculated by the control unit 1140. In FIG. 13, the AGV is heading to the destination presented on G114. As described above, the user can easily grasp the AGV operation status by presenting the 2D map, the position of the AGV, the detection result of the object, and the route. Note that G111 to G114 may allow the user to more easily understand the operation status by changing the color, the thickness of the line, and the shape.

G120 shows an example of presentation of the 3D map held by the holding unit 1130. G121 is an example of visualizing the result of updating the 3D map held by the holding unit 1130 using the result of the calculation unit 1120 dividing the depth map into meaningful areas as described in the sixth embodiment. Specifically, non-moving objects obtained from factory CAD data are dark, and are presented as lighter as other moving objects such as other AGVs and people. In addition, you may change and present the color for every layer not only in density. Also, the label of the object detected by the calculation unit 1120 is presented on the GUI 122. Thus, by presenting the 3D map, the user can comprehend the operation status in consideration of the height direction in comparison with the 2D map. In addition, if it is an object type found while the AGV is traveling, it can be searched without going to the site.

G130 has shown the example of presentation of the picture which image pick-up part 110 acquired. In G131 and G132, as described in the sixth embodiment, a banding box is superimposed on another AGV which is an object detected by the calculation unit 1120 or the outer circumference of a person in a dotted line. However, it may be a practice or a double line, or it may be emphasized by changing and presenting a color. As described above, by superimposing the detection result of the object on the image acquired by the imaging unit 110, the user can confirm the object detected by the calculation unit 1120 without any trouble.

G140 shows the example of presentation of the depth map which image pick-up part 110 acquired. G141 is an example in which the CAD model of the object held by the holding unit 1130 described in the fifth embodiment is superimposed as a wire frame using the position and orientation of the object calculated by the calculation unit 1120. G142 is an example in which an AGV CAD model is superimposed as a wire frame. G143 is an example in which a CAD model of a three-dimensional marker is superimposed. Thus, the user can easily grasp the detected object by presenting the object whose position and orientation is calculated on the depth map. Further, when controlling the AGV using the detected position and orientation of the object, it is possible to grasp the calculation deviation of the position and orientation of the AGV from the positional displacement between the depth map and the CG. A wire frame may be further superimposed on G130. Then, the user can compare the deviation between the photographed image and the model, and can more easily and intuitively confirm the position and orientation calculation accuracy of the AGV and the object detection accuracy.

G150 indicates a GUI for manually operating the AGV, values calculated by the calculation unit 1120 and the control unit 1140, and an example of presentation of operation information of the AGV. G151 is an emergency stop button, and the user can stop the movement of the AGV by touching the button with a finger. G152 is a mouse cursor, which can move the cursor according to a user's touch operation through a mouse, a controller, and a touch panel (not shown), and can operate buttons and radio buttons in the GUI by pressing a button. . G153 is an example showing a controller of AGV. By moving the circle inside the controller up, down, left, and right, the user can perform the front, rear, left, and right movement of the AGV according to those inputs. G154 is an example showing the internal state of AGV. The AGV is illustrated as an example in which it is traveling automatically and operating at a speed of 0.5 m / s. In addition, operational information such as the time since the AGV started to travel, the remaining time to the destination, and the difference in the estimated arrival time with respect to the schedule are also presented. G156 is a GUI for setting the operation and display information of the AGV. The user can perform operations such as whether to generate map information and whether to present a detected object. G157 is an example of presenting AGV operation information. In this example, the position and orientation calculated by the calculation unit 1120, the destination coordinates received from the mobile management system 13, and the name of the article being transported by the AGV are presented. Thus, AGV can be more intuitively operated by presenting the GUI related to the input from the user that presents the operation information.

The processing procedure of the information processing apparatus in the seventh embodiment is a display information generation step in which the calculation unit 1120 generates display information after step S160 of FIG. 5 which describes the processing procedure of the information processing apparatus 10 described in the first embodiment. (Not shown) differs in that a new addition is made. In the display information generation step, the display information is rendered based on the visual information captured by the imaging unit 110, the position and orientation calculated by the calculation unit 1120, the detected object, and the control value calculated by the control unit 1140, and output to the display device Do.

In the seventh embodiment, the calculation unit generates display information based on the visual information acquired by the imaging unit, the position and orientation calculated by the calculation unit, the detected object, and the control value calculated by the control unit, and presents it on the display. . Thus, the user can easily check the state of the information processing apparatus. In addition, the user inputs an AGV control value, various parameters, a display mode, and the like. This makes it possible to easily change or move various settings of the AGV. Thus, by presenting the GUI, it becomes possible to easily operate the AGV.

The display device is not limited to the display. If a projector is mounted on the AGV, display information can also be presented using the projector. In addition, if a display device is connected to the mobile management 13 system, display information may be transmitted and presented to the mobile management system 13 via the communication I / F (H17). Further, it is also possible to transmit only the information necessary for generating the display information, and to generate the display information by a computer inside the mobile management system 13. By doing this, the user can easily perform the operation status and operation of the AGV without confirming the display device mounted on the AGV.

The display information in the present embodiment may be anything as long as it presents information handled by the present information processing. In addition to the display information described in the present embodiment, it is also possible to display residuals at the time of position and orientation calculation and recognition likelihood values at the time of object detection. Furthermore, the time and frame rate related to the position and orientation calculation, the remaining information of the AGV battery, etc. may be displayed. Thus, by presenting the information handled by the present information processing apparatus, the user can confirm the internal state of the present information processing apparatus.

The GUI described in the present embodiment is an example, and any GUI may be used as long as it can perform an operation (input) on the AGV to grasp the operating status of the AGV. For example, the display information can be changed such as changing color, switching line thickness, solid line, broken line, double line, scaling, and hiding unnecessary information. Also, the object model may display a contour instead of a wire frame, or a transparent polygon model may be superimposed. By changing the method of visualizing display information in this manner, the user can more intuitively understand the display information.

The GUI described in the present embodiment can also be connected to a server (not shown) via the Internet. With such a configuration, for example, when a defect occurs in the AGV, the person in charge of the AGV manufacturer can confirm the state of the AGV by acquiring display information via the server without going to the site. .

The input device is exemplified by a touch panel, but any input device that receives an input from the user may be used. It may be a keyboard, a mouse, or a gesture (for example, it may be recognized from visual information acquired by the imaging unit 110). Furthermore, the mobile management system may be an input device via the network. In addition, if a smartphone or a tablet terminal is connected via the communication I / F (H17), they can also be used as a display device / input device.

In the present embodiment, the input device is not limited to the one described in the present embodiment, and anything may be used as long as it changes the parameters of the information processing apparatus. For example, the user's input may be accepted to change the upper limit (the upper limit of the speed) of the control value of the moving object, or the destination point clicked by the user may be input on G110. Furthermore, the user's selection of a model used for object detection and a model not used may be input. Even if an object that could not be detected is input on the G 130 so that the user encloses it, the learning device (not shown) of the learning model is configured to learn so as to detect the object according to the visual information of the imaging unit 110 Good.

[Eighth embodiment]
In the sixth embodiment, the visual information obtained by the imaging unit 110 is divided into semantic regions, the type of object is determined by each pixel, and a map is generated, and these maps and the determined type of object. We described how to control AGV. Embodiment 8 further describes a method of recognizing different semantic information depending on the situation even in the same object type, and controlling the AGV based on the recognition result.

In the eighth embodiment, from the visual information acquired by the imaging unit 110, the stacking degree of objects such as stacked products in a factory is recognized as semantic information. That is, it recognizes the semantic information of the object that is in the field of view of the imaging unit 110. Then, a method of controlling the AGV according to the stacking degree of the objects will be described. In other words, the AGV is controlled to more safely avoid objects that are stacked. The stacking degree of objects in the present embodiment is the number of stacked objects or the height.

An AGV control value calculation uses an occupancy map indicating whether space is occupied by an object. In the present embodiment, as the occupancy map, a two-dimensional occupancy grid map is used in which a scene is divided into grids and the probability that an obstacle exists in each grid is held. In this embodiment, it is assumed that the occupancy map holds a value representing the approach rejection degree of AGV (passing is permitted as closer to 0 and passing is rejected as closer to 1). . The AGV controls to the destination so that the approach rejection value of the occupancy map does not pass through the area (the grid in the present embodiment) which is equal to or more than a predetermined value. The destination is a two-dimensional coordinate which is the destination of the AGV, which is included in the operation information acquired from the process management system 12.

The information processing system according to the present embodiment is the same as the system configuration described in FIG.

FIG. 14 is a diagram showing a module configuration of the mobile unit 12 including the information processing apparatus 80 according to the eighth embodiment. The information processing apparatus 80 includes an input unit 1110, a position and orientation calculation unit 8110, a semantic information recognition unit 8120, and a control unit 8130. The input unit 1110 is connected to the imaging unit 110 mounted on the moving body 12. The controller 8130 is connected to the actuator 120. In addition to these, a communication device (not shown) communicates information with the mobile management system 3 in a bi-directional manner, and inputs / outputs to / from various means of the information processing device 80.

The imaging unit 110, the actuator 120, and the input unit 1110 in the present embodiment are the same as in the first embodiment, and thus the detailed description will be omitted.

The position and orientation calculation unit 8110, the semantic information recognition unit 8120, and the control unit 8130 will be sequentially described below.

The position and orientation calculation unit 8110 calculates the position and orientation of the imaging unit 110 based on the depth map input by the input unit 1110. Also, a three-dimensional map of the scene is created based on the calculated position and orientation. The calculated position and orientation and the three-dimensional map are input to the semantic information recognition unit 8120 and the control unit 8130.

The semantic information recognition unit 8120 uses the depth map input by the input unit 1110, the position and orientation calculated by the position and orientation calculation unit 8110, and the three-dimensional map to calculate the number of objects and heights of stacked objects as semantic information. presume. The estimated number and height values are input to the control unit 8130.

The control unit 8130 inputs the position and orientation calculated by the position and orientation calculation unit 8110 and the three-dimensional map. Further, the value of the number and height of stacked objects as semantic information estimated by the semantic information recognition unit 8120 is input. The control unit 8130 calculates an approach rejection value to an object in the scene based on the input value, and controls a control value for controlling the AGV so that the object passes through the cells of the occupied grid above the predetermined approach rejection value. calculate. Control unit 8130 outputs the calculated control value to actuator 120.

Next, the processing procedure in the present embodiment will be described. FIG. 15 is a flowchart showing the processing procedure of the information processing apparatus 80 in the present embodiment. The processing steps include initialization S110, visual information acquisition S120, visual information input S130, position and orientation calculation S810, semantic information estimation S820, control value calculation S830, control S160, and system termination determination S170. Note that the initialization S110, the visual information acquisition S120, the visual information input S130, the control S160, and the system termination determination S170 are the same as those in FIG. The steps of position and orientation calculation S810, semantic information estimation S820, and control value calculation S830 will be described in order below.

In step S810, the position and orientation calculation unit 8110 calculates the position and orientation of the imaging device 110, and creates a three-dimensional map. This is realized by an SLAM (Simultaneous Localization and Mapping) algorithm that performs position and orientation estimation while creating a map based on the position and orientation. Specifically, the position and orientation are calculated by the ICP algorithm such that the difference in depth of the depth map acquired by the imaging unit 110 at a plurality of times is minimized. In addition, a three-dimensional map is created using a Point-Based Fusion algorithm that integrates depth maps in time series based on the calculated position and orientation.

In step S820, the semantic information recognition unit 8120 divides the depth map and the three-dimensional map into areas, and calculates the overlapping number (n) and height (h) of objects for each area. Specific processing procedures will be described in order below.

First, the normal direction is calculated based on the depth value of each pixel of the depth map and the surrounding pixels. Next, if the inner product value of surrounding pixels with the normal direction is larger than a predetermined value, a unique area identification label is assigned as the same object area. In this way, the depth map is segmented. Then, the area identification label is propagated to each point cloud of the three-dimensional map pointed to by each pixel of the area divided depth map to perform area division of the three-dimensional map.

Next, bounding boxes are created by dividing the three-dimensional map in the XZ direction (moving plane of the AGV) at equal intervals. Each divided bounding box is scanned in order from the ground in the vertical direction (Y-axis direction), and the number of labels of each point cloud included in the bounding box is counted. In addition, the maximum value of the height from the ground (XZ plane) of the point cloud is calculated. The calculated number n of areas and the maximum height h are held in a three-dimensional map for each point cloud.

In step S830, the control unit S160 creates an occupancy map based on the three-dimensional map. Also, the value of the approach rejection of the occupancy map is updated from the number of overlapping objects (n) and the height (h) of the objects. Then, the AGV is controlled based on the updated occupancy map.

Specifically, first, the three-dimensional map created in step S810 is projected onto the XZ plane, which is a floor surface corresponding to the movement plane of the AGV, to obtain a 2D occupancy map. Next, using the distance between the points obtained by projecting each point cloud of the three-dimensional map onto the XZ plane and each occupied map, and the number of overlapping objects (n) and height (h) of the objects in the point cloud, occupied Update the close rejection value, which is the value of each grid in the map. The i-th point coordinates X-Z obtained by projecting the cloud _{P i} in X-Z plane and _{p i.} Also, let the coordinates of the j-th cell Qj of occupancy be q _j . The value of occupancy is larger as h and N are larger and smaller as the distance is larger. Where d _ij is the Euclidean distance between p _i and q _i .

Based on the information on the occupancy map and the target position / posture and the current position / posture determined as described above, the AGV and the target position / posture are minimized while the AGV is used to avoid grids with high values of approach rejection of the occupancy map. Determine the traveling route (moving route) of and calculate the control value. The control value calculated by the controller 8130 is output to the actuator 130.

In the eighth embodiment, the stacking number and height of objects around the AGV are estimated as the semantic information, and the AGV is controlled to travel at a distance from the objects as their values become larger. This makes it possible for the AGV to travel at a further distance from the objects, for example, when there are shelves or pallets on which the AGV is loaded with a lot of articles, such as a distribution warehouse, which makes the AGV safer. I will be able to operate.

<Modification 8-1>
The imaging unit 110 in the present embodiment may be anything as long as it can acquire an image and a depth map, such as a TOF camera or a stereo camera. Furthermore, an RGB camera that acquires only an image, or a monocular camera such as a monochrome camera may be used. When a single-eye camera is used, depth is required for position and orientation calculation and occupancy map generation processing, but the present embodiment is realized by calculating the depth value from the movement of the camera. The imaging unit 110 described in the following embodiments is also configured in the same manner as the present embodiment.

<Modification 8-2>
The value of the approach rejection degree of the occupancy map is not limited to the method described in the present embodiment as long as it is a function whose value is larger as the height of the object is higher and the stacking number is larger and smaller as the distance is larger. For example, it may be a function proportional to the height or stacking number of the object, or may be a function inversely proportional to the distance. The function may consider only one of the height of the object or the number of stacks. It may be determined with reference to a list that stores occupancy values according to the distance, the height of the object, and the number of stacks. Note that this list may be stored in advance in the external memory (H14) or held by the mobile management system 13, and may be stored in the information processing apparatus via the communication I / F (H17) as necessary. It may be downloaded to the information processing apparatus 80.

The occupancy map is not limited to the occupancy map as described in the present embodiment, but may be anything as long as it can determine the presence or absence of an object in the space. For example, it may be represented as a point cloud of a predetermined radius or may be approximated by some function. Not only a two-dimensional occupancy map but also a three-dimensional occupancy map may be used. For example, a signed distance field which is a TSDF (Trunked Signed Distance Function) even when stored in a 3D voxel space (X, Y, Z) You may hold as.

In the present embodiment, the control value is calculated using a line map in which the degree of approach rejection is changed according to the height and the number of stacks of the object, but the control value may be changed based on the semantic information of the object. For example, it is not limited to this. For example, the control value may be determined with reference to a list describing a control method according to the height and stacking number of objects. Specifically, the list describing the control method is a list that defines operations such as turning to the left or decelerating if the number of objects and the number of stacking are predetermined values and the conditions are satisfied. Alternatively, the AGV may be controlled based on a predetermined rule, such as calculating a control value that rotates so as not to appear in the field of view when objects of a predetermined height or stacking number are found. Alternatively, the AGV may be controlled by applying a function having a measured value as a variable, such as calculating a control value that reduces the speed as the height and the number of stacks increase.

<Modification 8-3>
In the present embodiment, the imaging unit 110 is mounted on the AGV. However, the imaging unit 110 need not be mounted on the AGV as long as it can capture the traveling direction of the AGV. Specifically, a surveillance camera attached to a ceiling may be used as the imaging device 110. At this time, the imaging device 110 can capture an AGV, and the position and orientation with respect to the imaging device 110 can be determined by, for example, an ICP algorithm. Alternatively, a marker may be attached to the upper part of the AGV, and the position and orientation may be obtained by the imaging device 110 detecting the marker. Also, the imaging device 110 may detect an object on the traveling route of the AGV. The imaging device 110 may be one or more.

The position and orientation calculation unit 8110, the semantic information recognition unit 8120, and the control unit 8130 do not have to be mounted on the AGV. For example, there is a configuration in which the control unit 8130 is mounted on the mobile management system 13. In this case, it can be realized by transmitting and receiving necessary information through the communication I / F (H17). By doing this, it is not necessary to place a large computer on the mobile AGV and the weight of the AGV can be lightened, so that the AGV can be operated efficiently.

<Modification 8-4>
In the present embodiment, the semantic information is the stacking degree of objects. However, the semantic information recognition unit 8120 may recognize any semantic information as long as the AGV can calculate the control value for operating safely and efficiently. Also, the control unit 8130 may calculate the control value using the semantic information.

For example, the position of the structure may be recognized as semantic information. Specifically, the degree of opening of a "door" which is a structure in a factory can also be used as semantic information. The AGV runs slower when the door is open or opening as compared to when the door is closed. Also, it recognizes that an object is suspended by a crane, and controls so as not to get into the lower part of the object. By doing this, the AGV can be operated more safely.

Although the stacking degree is recognized in the present embodiment, it is also possible to recognize that they are lined up close to each other. For example, a plurality of bogies are recognized, and if their distances are smaller than a predetermined value, a control value is calculated so as to operate at a predetermined distance or more.

Also, the other AGVs and the packages located above them may be recognized to recognize that the packages are on the AGVs. If another AGV carries a package, it will avoid itself (AGV), otherwise it will go straight ahead and avoid other AGVs to avoid itself (AGV) via mobile management system 13 It may be sent. When a package is loaded in another AGV, the size of the package may be recognized and the control method may be determined according to the size. In this way, by determining whether or not the luggage is mounted and the size of the luggage, the energy or the energy required for the movement by the avoidance operation of the AGV without the luggage and the AGV carrying the smaller luggage The AGV can be operated efficiently by reducing the time. Furthermore, by recognizing the type of an object mounted on another AGV, and recognizing the value or fragility of the object according to the type, if it is loaded with something of high value or fragile, A control value which AGV avoids may be calculated. By recognizing the type of package, AGV can be operated safely with less damage to the package.

It is also possible to use the outer shape of the object shown in the input image as the semantic information. Specifically, when the detected object is pointed, by traveling at a distance from such an object, the AGV can be operated safely without being injured. Moreover, if it is a flat object like a wall, by operating a fixed distance, it is possible to suppress the fluctuation of the AGV and operate stably and efficiently.

As semantic information, the degree of danger or fragility of the object itself may be recognized. For example, when recognizing the letters “danger” and the mark on the cardboard, the AGV is controlled to move away from the cardboard by a predetermined distance or more. By doing this, it is possible to operate the AGV more safely on the basis of the danger or fragility of the object. In addition, the lighting condition of the laminated lamp indicating the operating condition of the automatic machine in the factory is recognized, and the control value is calculated so as not to approach a predetermined distance or more when the automatic machine is in operation. In this way, the AGV is detected by the safety sensor of the automatic machine, and there is no need to stop the automatic machine, and the AGV can be operated efficiently.

<Modification 8-5>
In the present embodiment, the method of decelerating the AGV based on the semantic information has been described. However, the control method is not limited to the above method, and any method that can operate AGV efficiently and safely can be used. For example, acceleration and deceleration parameters may be changed. In other words, precise control can be performed such as whether to decelerate gently or suddenly decelerate according to the semantic information. The parameters of the avoidance may be changed, or the control may be switched such as whether to avoid near the object, to largely avoid, to change the route and to avoid, or to stop. In addition, the frequency of calculation of control value of AGV is increased or decreased. By increasing the frequency, finer control can be achieved, and by decreasing the frequency, slow control can be achieved. Thus, AGV is operated more efficiently and safely by changing the control method based on the semantic information.

[Embodiment 9]
In the eighth embodiment, the AGV is controlled based on static semantic information around a certain time, such as the stacking degree and shape of objects existing around the AGV, and the state of the structure. In the ninth embodiment, AGVs are controlled based on these temporal changes. The semantic information in the present embodiment refers to the amount of movement of an object shown in an image. In the present embodiment, in addition to the movement amount of the object shown in the image, the type of the object is also recognized, and a method of calculating the control value of the AGV based on the result is described. Specifically, it recognizes AGVs, packages placed on them, and other AGVs as the types of surrounding objects, and the amount of movement of other AGVs, and based on the recognition results, their own (AGV) or other Calculate the control value of AGV.

The configuration of the information processing apparatus in the present embodiment is the same as that of FIG. 14 of the information processing apparatus 80 described in the eighth embodiment, and thus the description thereof is omitted. The difference from the eighth embodiment is that the semantic information estimated by the semantic information recognition unit 8120 and input to the control unit 8130 is a movement amount of an AGV and a load placed thereon as a detected object type, and the movement amount of another AGV. is there.

The diagram of the processing procedure in the present embodiment is the same as FIG. 15 for describing the processing procedure of the information processing apparatus 80 described in the eighth embodiment, and therefore the description thereof is omitted. What differs from the eighth embodiment is the processing contents of the semantic information estimation step S820 and the control value calculation S830.

In the semantic information estimation step S820, the semantic information recognition unit 8120 divides the depth map into areas, and further estimates the type of the object for each area. At this time, the position and size of the object estimated together are estimated. Next, among the detected objects, the position of (the other) AGV is compared with the past position of the (the other) AGV to calculate the movement amount of the (the other) AGV. In the present embodiment, the amount of movement of another AGV is the amount of change in the relative position and orientation with respect to oneself (AGV).

First, as described in the sixth embodiment, the depth map is divided into areas based on the image and the depth map, and an object type for each area is specified.

Next, the area recognized as AGV is extracted, and the relative positional relationship with other areas is calculated. At this time, it is determined that a region having a distance to the AGV smaller than a predetermined threshold value and in a vertical (Y-axis) direction from a region recognized as an AGV is a cargo region mounted on the AGV. In addition, get the size of the AGV, and the size of the loaded luggage area. The size is the length of the long side of the bounding box that encloses the luggage area.

Then, the regions recognized as AGV at time t-1 and time t are extracted respectively, and their relative positional relationship is calculated using an ICP algorithm. The calculated relative positional relationship is the amount of change in the relative position and orientation of another AGV relative to oneself (AGV). This is hereinafter referred to as the movement amount of another AGV.

The control value calculation S 830 is the action of the control unit 8130 itself (AGV) based on the movement amount of the other AGV calculated by the semantic information recognition unit 8120 in step S 820 and the size of the other AGV and the package mounted thereon. Decide.

First, it is determined whether the other AGV is approaching or away from itself (AGV) based on the movement amount of the other AGV. The control value is not changed when the other AGVs move away. On the other hand, when approaching, a new control value is calculated based on the size of the package. Specifically, the size of one's own (AGV) stored in RAM (H13) in advance by input means (not shown) is compared with the size of other AGV's and the size of its package, and one's own (AGV's) If it is smaller, you (AGV) do route planning to avoid other AGVs. On the other hand, if the own (AGV) is larger, the signal of making the mobile management system 13 perform the avoidance operation is sent to the mobile management system 13 through the communication interface H17 while decelerating the own (AGV).

As described above, in the ninth embodiment, control is performed based on the result of determining the type of an object around AGV as the semantic information, and further estimating the movement amount and the size of the loaded luggage in the case of another AGV. Calculate the value. At this time, if another AGV or its package is larger than itself (AGV), control is performed such that oneself is avoided and, conversely, the other party is avoided. By doing this, it becomes possible to control the AGV having a small size and a small loaded package to give up the route, and it is possible to operate the AGV efficiently with time and energy.

<Modification 9-1>
In the present embodiment, AGV is detected as another mobile body. However, not only the AGV but any object may be detected as long as at least the position and the posture change and the control of the AGV can be changed accordingly. Specifically, a forklift or a mobile robot may be detected as the mobile body. Further, the position or posture of a part of the apparatus may recognize the amount of change as the semantic information, and the control of the AGV may be changed accordingly. For example, if the moving amount of the movable unit of the machine such as an automatic machine or an arm robot arm or a belt conveyor is larger than a predetermined operation speed, for example, the AGV may be controlled to be away from them by a predetermined distance.

<Modification 9-2>
In the present embodiment, the control value is calculated such that one of the AGV and the other AGV avoids, but any control method may be used as long as the control value is changed according to the movement of the object. Specifically, the value of the approach rejection degree of the occupancy map described in the eighth embodiment may be dynamically updated according to the magnitude of the movement amount, and the control value of the AGV may be calculated using this.

In addition, a control value may be calculated so as to move following the other AGV if the moving amount of the other AGV is in the same direction as that of one's own (AGV). If there is an AGV that has come from the lateral direction first when the crossroad is reached, a control value may be calculated to wait until the passage is finished, or if the user leads the crossroad via the mobile management system 13 A control value that causes another AGV to stand by may be calculated. Furthermore, when it is observed that another AGV vibrates to the left and right with respect to the traveling direction, for example, or when a movement of a load mounted on another AGV is vibrated with respect to the other AGV, constant A control value may be calculated that passes through a route that does not approach the distance or more.

The work process may be further recognized as semantic information from the movement of the object. For example, it may be recognized that the robot is in the process of loading another AGV. At this time, the control value may be calculated such that oneself (AGV) searches for another route. In addition, for example, in a physical distribution warehouse, it is possible to recognize an operation of loading a package on a shipping pallet, and control so that a moving body (forklift truck) approaches the pallet. As described above, the movement of the object is recognized as the semantic information, and the mobile AGV and the forklift are controlled according to them to operate more efficiently.

Tenth Embodiment
In the tenth embodiment, a method for safely operating an AGV will be described based on the result of recognizing the work and role of a person. In the present embodiment, a work type of a person is estimated based on human beings and object types held by the person as semantic information, and AGV control is performed according to the work type. As a specific example, a person lifts a handlift that is pressed by a person and controls to make the AGV avoid by recognizing a transport work as a work type, detects a welder possessed by a person and a person, and recognizes a welding work as a work type Implement control such as changing the AGV route. In the present embodiment, it is assumed that the proximity rejection degree parameter that determines the control of the AGV in accordance with the human being and the object possessed by the human is given in advance by hand. The parameters are, for example, 0.4 when a person has a large package, 0.6 when a person is pushing a truck, 0.9 when a person has a welding machine, etc. It is. In the present embodiment, the mobile unit management system 13 holds a parameter list in which these parameters are held. As necessary, the data can be downloaded from the mobile management system 13 to the information processing apparatus 80 via the communication I / F (H17) to the information processing apparatus, and can be stored and referenced in the external memory (H14).

The configuration of the information processing apparatus in the present embodiment is the same as that of FIG. 14 of the information processing apparatus 80 described in the eighth embodiment, and thus the description thereof is omitted. The difference from the eighth embodiment is that the semantic information that the semantic information recognition unit 8120 estimates and inputs to the control unit 8130 is different.

In the semantic information estimation step S820, the semantic information recognition unit 8120 recognizes a person and an object type held by the person from the input image. Then, the AGV is controlled based on the parameter list in which the control rules of the AGV corresponding to the person and the object held by the person stored in advance in the external memory H14 are recorded.

First, the site of a human hand is detected from visual information. For the detection of human hand sites, a method is used that recognizes each human site and their connection and estimates human skeletons. Then, image coordinates corresponding to the position of the human hand are acquired.

Next, an object type held by a person is detected. For the detection of an object, the neural network described in the sixth embodiment and trained to divide an image into regions according to object types is used. Among the divided areas, an area within a predetermined distance from the image coordinates of the position of the human hand is recognized as an object area held by a person, and an object type assigned to the area is acquired. Note that the object type mentioned here is uniquely associated with the object ID held by the above list.

Finally, referring to the obtained object ID and the control parameter list described above, the parameter of the approach rejection degree is acquired. The acquired information is input to the control unit 8130 by the semantic information recognition unit 8120.

In the control value calculation step S830, the control unit 8130 determines the action of itself (AGV) based on the parameter of the approach rejection degree of the object calculated by the semantic information recognition unit 8120 in step S820. The control value is calculated by updating the value of the approach rejection degree of the occupancy map described in the eighth embodiment as follows. Note that this is a function that increases as the value of the approach rejection degree increases and decreases as the distance increases.

Here, Score _j is the value of the j-th grid. s _i is a parameter representing the approach rejection degree of the ith object detected in step S 820. The travel route of the AGV is determined as described in the eighth embodiment using the occupancy map defined as described above.

Further, the control value is calculated so as to limit the maximum value v _max of the AGV velocity as follows based on the value of the approach rejection of the occupancy map in progress.

α is an adjustment parameter between the value of the approach rejection degree of the occupancy map and the speed, and β is a value of the approach rejection degree of the occupancy map currently passing by the AGV. v _max is calculated as a value that approaches 0 as the value of the approach rejection of the occupancy map increases (closes to 1). The control value calculated by the control unit 8130 in this manner is output to the actuator 130.

In the tenth embodiment, the type of work of a person is determined from the combination of a person and an object held by the person, and a parameter representing the approach rejection degree is determined. Then, the control value is calculated so as to move slower as it gets away from the person as the approach rejection degree is larger. This controls the AGV at an appropriate distance according to the work of the person. In this way, AGV can be controlled more safely.

<Modification>
In the tenth embodiment, although a combination of a person and an object held by the person is recognized as the semantic information, it is not limited to the above method as long as it is a method of more safely controlling the AGV by recognizing a state accompanying the person.

People's clothes may be recognized as semantic information. For example, in a factory, it is assumed that it is a worker wearing a work clothes and a visitor wearing a suit. This recognition result is used to control AGV more safely by comparing with a worker who is used to the movement of AGV and making it progress more slowly when passing in the vicinity of the visitor who is not particularly used to the movement of AGV. Do.

A person's age may be recognized as semantic information. For example, in an AGV that performs in-hospital delivery in a hospital, when a child or an elderly person is recognized, the AGV can be more safely operated by passing at a predetermined distance slowly.

Human movement may be recognized as semantic information. For example, in an AGV carrying luggage at a hotel, when it is recognized that a person repeatedly moves back and forth and left and right as when walking with a stagger, a control value to be passed at a predetermined distance is calculated. AGV can be operated safely.

You may recognize work from human movement. Specifically, when the worker detects an operation to load the AGV in the factory, a control value may be calculated so as to approach the worker slowly and stop until the loading of the package ends. By doing this, it is not necessary to load the load after the operator has moved to the stop position of the AGV, and the work can be performed efficiently.

The number of people may be recognized as semantic information. Specifically, the route is changed when more people than a predetermined number are recognized on the progress route of the AGV. By doing this, it is possible to operate the AGV more safely by avoiding contact with one person in case of progressing through people.

[Embodiment 11]
A UI that can be commonly applied to the eighth to tenth embodiments will be described. In addition to the visual information acquired by the imaging unit 110, the position / orientation calculated by the position / orientation calculation unit 8110, the map information, and the UI for presenting control values, the information is further allocated to the semantic information or occupancy map described in the eighth to tenth embodiments. Display information such as value.

The configuration of the device in the eleventh embodiment is the same as that of FIG. 2 showing the configuration of the information processing device 80 described in the eighth embodiment, and thus the description thereof is omitted. The configuration of the device for display is the same as the configuration described in the seventh embodiment, and is thus omitted.

FIG. 13 shows a GUI 200 which is an example of display information presented by the display device according to the present embodiment. G210 is a window for presenting the visual information acquired by the imaging unit 110 and the semantic information recognized by the semantic information recognition unit 8120. G220 is a window for presenting the approach rejection for navigation of the AGV described in the eighth embodiment. G230 is a window for presenting a 2D occupancy map. G240 is a GUI for manually operating the AGV, a value calculated by the position and orientation calculation unit 8110, the semantic information recognition unit 8120, and the control unit 8130, and a window for presenting AGV operation information.

G210 shows an example of presentation of a plurality of objects, their relative distances, and values of the approach rejection degree as the semantic information detected by the semantic information recognition unit 8120. G211 is a bounding box of the detected object. In this embodiment, bounding boxes that detect other AGVs and their packages and surround them are indicated by dotted lines. Although a plurality of objects are integrated to present a bounding box, a bounding box may be drawn for each of the detected objects. In addition, the bounding box may be anything as long as the position of the detected object is known, and may be drawn by a dotted line or a solid line, or a semitransparent mask may be superimposed and presented. G212 is a pop-up that presents the detected semantic information. A plurality of detected object types, their distances, and values of proximity rejection are presented. Thus, by superimposing the recognized semantic information on visual information and presenting it, the user can intuitively associate visual information with the semantic information and grasp.

G220 is an example in which the proximity rejection degree of the AGV calculated by the control unit 8130 is superimposed on the visual information acquired by the imaging unit 110. G221 superimposes darker colors as the degree of approach rejection is higher. By presenting the approach rejection degree superimposed on the visual information in this manner, the user can intuitively associate the visual information with the approach rejection degree and grasp the information. Note that G221 may allow the user to more easily understand the approach rejection degree by changing the color, density, or shape.

G230 is an example of presenting the occupancy map calculated by the control unit 8130 and the semantic information recognized by the semantic information recognition unit 8120. G 231 visualizes the value of the approach rejection of the occupancy map so that the value of the approach rejection of the occupancy map becomes larger as the value of the proximity rejection becomes larger and smaller as the value becomes smaller. G232 further presents the position of the structure as the semantic information recognized by the semantic information recognition unit 8120. In the present embodiment, an example is presented in which the result of recognizing that the factory door is open is presented. G233 further presents the movement amounts of surrounding objects as the semantic information recognized by the semantic information recognition unit 8130. In the present embodiment, the moving direction and the speed of the object are presented. In this manner, by presenting the occupancy map, the value thereof, and the recognition result of the semantic information, the user can easily associate them and grasp the internal state of the AGV. Also, by presenting the occupancy map in this manner, the user can easily grasp the AGV route generation process of the control unit 8130.

G240 shows a GUI for manually operating the AGV, a value calculated by the position and orientation calculation unit 8110, the semantic information recognition unit 8120, and the control unit 8130, and an example of presentation of operation information of the AGV. G241 is a GUI for setting the semantic information recognized by the semantic information recognition unit 8120 and whether or not to display the recognition result, and is, for example, a radio button for switching on / off of an item. G 242 is a GUI for adjusting the proximity rejection distance calculated by the control unit 8130 and parameters for calculating the control value, and corresponds to, for example, a slide bar or a number input form.

<Modification>
The GUI described in the present embodiment is an example, and the semantic information calculated by the semantic information recognition unit 8120, the value of the proximity rejection of the occupancy map calculated by the control unit 8130, etc. are presented, and the internal state of the AGV is grasped. Any visualization method may be used as long as it is a GUI to be used. For example, the display information can be changed such as changing color, switching line thickness, solid line, broken line, double line, scaling, and hiding unnecessary information. By changing the method of visualizing display information in this manner, the user can more intuitively understand the display information.

The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or storage medium, and one or more processors in a computer of the system or apparatus read and execute the program. Can also be realized. It can also be implemented by a circuit (eg, an ASIC) that implements one or more functions.

The present invention is not limited to the above embodiment, and various changes and modifications can be made without departing from the spirit and scope of the present invention. Accordingly, the following claims are attached to disclose the scope of the present invention.

The present application claims priority based on Japanese Patent Application No. 2018-003817 filed on Jan. 12, 2018, the entire contents of which are incorporated herein by reference.

Claims

An input unit that receives an input of image information acquired by an imaging unit that is mounted on a moving body and in which each light receiving unit on the imaging element is configured by two or more light receiving elements;
Holding means for holding map information;
Acquisition means for acquiring the position and orientation of the imaging unit based on the image information and the map information;
A control unit for obtaining a control value for controlling the movement of the movable body based on the position and orientation acquired by the acquisition unit;
An information processing apparatus comprising:
The image information is information captured by the imaging unit, and the image information is a depth map generated by the imaging unit selectively configuring the light receiving element.
The information processing apparatus according to claim 1, wherein
The image information is information captured by the imaging unit, and the image information is a three-dimensional point that holds three-dimensional position information in a space generated by the imaging unit selectively configuring the light receiving element. Being a group,
The information processing apparatus according to claim 1, wherein
The image information may further include an image generated by the imaging unit selectively configuring the light receiving element.
The information processing apparatus according to claim 2 or 3, characterized in that
The acquisition unit updates the image information using the image information and the image information acquired at a second time before the first time when the imaging unit acquires the image information, and the updated image information Acquiring the position and orientation of the imaging unit based on the map information and the map information;
The information processing apparatus according to any one of claims 1 to 4, wherein
The control means further calculates a control value for controlling a projection device for projecting the pattern light.
The information processing apparatus according to any one of claims 1 to 5, characterized in that:
The input unit further receives an input of three-dimensional information measured by a three-dimensional measuring device that acquires three-dimensional information representing a three-dimensional position in space;
The acquisition unit further updates the image information based on the image information and the three-dimensional information, and acquires the position and orientation of the imaging unit based on the updated image information and the map information.
The information processing apparatus according to any one of claims 1 to 6, wherein
The acquisition unit further acquires feature information of an object from either or both of the image information and the map information.
The control means may calculate a control value for controlling the movable body based on feature information of the object.
The information processing apparatus according to any one of claims 1 to 7, characterized in that
The control means further calculates a control value for controlling the movable body such that a predetermined object is located at a predetermined point of the image information based on the feature information of the object.
The information processing apparatus according to claim 8, characterized in that:
The acquisition unit further divides the image information into areas by meaningful area division;
The information processing apparatus according to any one of claims 1 to 9, wherein
The information processing apparatus according to claim 10, wherein the acquisition unit further generates and updates the map information based on the result of the area division.
The control means further calculates a control value for controlling the mobile body based on the result of the area division;
The information processing apparatus according to claim 10 or 11, characterized in that
The control unit further calculates an adjustment value for adjusting a parameter of the imaging unit based on at least one of the image information, the map information, the acquired position and orientation, and the control value.
The information processing apparatus according to any one of claims 1 to 12, characterized in that
The adjustment value is a focus value of the imaging unit,
The information processing apparatus according to claim 13, characterized in that
The adjustment value is a zoom value of the imaging unit,
The information processing apparatus according to claim 13, characterized in that
The imaging unit can replace an optical device,
The input means further obtains parameters of the replaced optical device,
The information processing apparatus according to any one of claims 1 to 15, wherein
17. A display information generation unit for generating display information based on at least one of the image information, the map information, the position and orientation, and the control value. The information processing apparatus according to claim 1.
An input step of receiving an input of image information acquired by an imaging unit configured by two or more light receiving elements, each light receiving unit on the imaging element mounted on the moving body;
Holding map information in the holding means;
An acquiring step of acquiring the position and orientation of the imaging unit based on the image information and the map information;
A control step of calculating a control value for controlling the movement of the movable body based on the position and orientation calculated by the calculation means;
An information processing method comprising:
An imaging unit in which each light receiving unit on the imaging device includes two or more light receiving elements;
An input unit that receives an input of the image information acquired by the imaging unit;
Holding means for holding map information;
Acquisition means for acquiring the position and orientation of the imaging means based on the image information and the map information;
A control unit for obtaining a control value for controlling the movement of the movable body based on the position and orientation acquired by the acquisition unit;
An information processing system characterized by
An imaging unit in which each light receiving unit on the imaging device includes two or more light receiving elements;
An input unit that receives an input of the image information acquired by the imaging unit;
Holding means for holding map information;
Acquisition means for acquiring the position and orientation of the imaging means based on the image information and the map information;
A control unit for obtaining a control value for controlling the movement of the movable body based on the position and orientation acquired by the acquisition unit;
An actuator for controlling the movement of the movable body by the control value;
Mobile body characterized by