WO2023085017A1

WO2023085017A1 - Learning method, learning program, information processing device, information processing method, and information processing program

Info

Publication number: WO2023085017A1
Application number: PCT/JP2022/038868
Authority: WO
Inventors: 周平花澤
Original assignee: ソニーセミコンダクタソリューションズ株式会社
Priority date: 2021-11-09
Filing date: 2022-10-19
Publication date: 2023-05-19

Abstract

A learning method according to the present disclosure is executed by a computer. The learning method includes: generating, on the basis of a predetermined number of point cloud data items (D2) thinned out from three-dimensional point cloud data (D1) acquired by LiDAR and an image (Pc) corresponding to the three-dimensional point cloud data (D1), a depth image (Dm) corresponding to the image (Pc); and performing machine learning by adjusting a coefficient of a convolutional neural network such that, when point cloud data (D3) remaining after the predetermined number of point cloud data items (D2) is thinned out from the three-dimensional point cloud data (D1) is defined as a ground truth, a difference between the depth image (Dm) and the ground truth becomes small.

Description

Learning method, learning program, information processing device, information processing method, and information processing program

The present disclosure relates to a learning method, a learning program, an information processing device, an information processing method, and an information processing program.

A technology that generates depth images from 3D point cloud data acquired by LiDAR (Light Detection And Ranging), and uses the depth images to generate teacher data used for machine learning of Convolutional Neural Networks (CNNs). There is (for example, see Patent Document 1).

When generating teacher data, for example, a plurality of depth images generated from three-dimensional point cloud data scanned multiple times are added (synthesized), and a stereo image is used from the synthesized image to remove incorrect point cloud data. Generate stripped ground truth.

Japanese Patent Application Laid-Open No. 2021-68138

However, since CNN machine learning requires a huge amount of teacher data, the above conventional technology requires a large amount of processing to generate ground truth.

Therefore, the present disclosure proposes a learning method, a learning program, an information processing device, an information processing method, and an information processing program that can reduce the amount of processing required to generate ground truth.

A learning method according to the present disclosure is a computer-executed learning method, and includes a predetermined number of point cloud data thinned out from three-dimensional point cloud data acquired by LiDAR (Light Detection And Ranging), and the three-dimensional point cloud generating a depth image corresponding to the image based on the image corresponding to the data; It includes performing machine learning by adjusting the coefficients of the convolutional neural network so that the difference between the depth image and the ground truth becomes small.

1 is a block diagram showing a configuration example of a vehicle control system according to the present disclosure; FIG. FIG. 4 is an explanatory diagram of a learning method according to the present disclosure; FIG. 3 is a diagram illustrating an example of an image according to the present disclosure; FIG. FIG. 4 is an explanatory diagram of label data added to an image according to the present disclosure; FIG. 3 is an explanatory diagram of processing executed by an information processing apparatus according to the present disclosure;

Below, embodiments of the present disclosure will be described in detail based on the drawings. In addition, in each of the following embodiments, the same parts are denoted by the same reference numerals, thereby omitting redundant explanations.

[1. Configuration example of vehicle control system]
FIG. 1 is a block diagram showing a configuration example of a vehicle control system 11, which is an example of a mobile device control system to which the present technology is applied.

The vehicle control system 11 is provided in the vehicle 1 and performs processing related to driving support and automatic driving of the vehicle 1.

The vehicle control system 11 includes a vehicle control ECU (Electronic Control Unit) 21, a communication unit 22, a map information accumulation unit 23, a position information acquisition unit 24, an external recognition sensor 25, an in-vehicle sensor 26, a vehicle sensor 27, a storage unit 28, a driving It has a support/automatic driving control unit 29, a DMS (Driver Monitoring System) 30, an HMI (Human Machine Interface) 31, and a vehicle control unit 32.

Vehicle control ECU 21, communication unit 22, map information storage unit 23, position information acquisition unit 24, external recognition sensor 25, in-vehicle sensor 26, vehicle sensor 27, storage unit 28, driving support/automatic driving control unit 29, driver monitoring system ( DMS) 30 , human machine interface (HMI) 31 , and vehicle control unit 32 are connected via a communication network 41 so as to be able to communicate with each other. The communication network 41 is, for example, a CAN (Controller Area Network), LIN (Local Interconnect Network), LAN (Local Area Network), FlexRay (registered trademark), Ethernet (registered trademark), and other digital two-way communication standards. It is composed of a communication network, a bus, and the like. The communication network 41 may be used properly depending on the type of data to be transmitted. For example, CAN may be applied to data related to vehicle control, and Ethernet may be applied to large-capacity data. In addition, each part of the vehicle control system 11 performs wireless communication assuming relatively short-range communication such as near field communication (NFC (Near Field Communication)) or Bluetooth (registered trademark) without going through the communication network 41. may be connected directly using

The communication unit 22 communicates with various devices inside and outside the vehicle, other vehicles, servers, base stations, etc., and transmits and receives various data.

The map information accumulation unit 23 accumulates one or both of the map obtained from the outside and the map created by the vehicle 1. For example, the map information accumulation unit 23 accumulates a three-dimensional high-precision map, a global map covering a wide area, and the like, which is lower in accuracy than the high-precision map.

The position information acquisition unit 24 receives GNSS signals from GNSS (Global Navigation Satellite System) satellites and acquires the position information of the vehicle 1 . The acquired position information is supplied to the driving support/automatic driving control unit 29 . Note that the location information acquisition unit 24 is not limited to the method using GNSS signals, and may acquire location information using beacons, for example.

The external recognition sensor 25 includes various sensors used for recognizing situations outside the vehicle 1 and supplies sensor data from each sensor to each part of the vehicle control system 11 . The type and number of sensors included in the external recognition sensor 25 are arbitrary.

For example, the external recognition sensor 25 includes a camera 51, a radar 52, a LiDAR (Light Detection and Ranging, Laser Imaging Detection and Ranging) 53, and an ultrasonic sensor 54.

The in-vehicle sensor 26 includes various sensors for detecting information inside the vehicle, and supplies sensor data from each sensor to each part of the vehicle control system 11 . The types and number of various sensors included in the in-vehicle sensor 26 are not particularly limited as long as they are the types and number that can be realistically installed in the vehicle 1 . For example, in-vehicle sensors 26 may comprise one or more of cameras, radar, seat sensors, steering wheel sensors, microphones, biometric sensors.

The vehicle sensor 27 includes various sensors for detecting the state of the vehicle 1, and supplies sensor data from each sensor to each section of the vehicle control system 11. The types and number of various sensors included in the vehicle sensor 27 are not particularly limited as long as the types and number are practically installable in the vehicle 1 . For example, the vehicle sensor 27 includes a velocity sensor, an acceleration sensor, an angular velocity sensor (gyro sensor), and an inertial measurement unit (IMU (Inertial Measurement Unit)) integrating them.

The storage unit 28 includes at least one of a nonvolatile storage medium and a volatile storage medium, and stores data and programs. The storage unit 28 is used as, for example, EEPROM (Electrically Erasable Programmable Read Only Memory) and RAM (Random Access Memory), and storage media include magnetic storage devices such as HDD (Hard Disc Drive), semiconductor storage devices, optical storage devices, And a magneto-optical storage device can be applied. The storage unit 28 stores various programs and data used by each unit of the vehicle control system 11 .

The driving support/automatic driving control unit 29 controls driving support and automatic driving of the vehicle 1 . For example, the driving support/automatic driving control unit 29 includes an analysis unit 61 , an action planning unit 62 and an operation control unit 63 .

The analysis unit 61 analyzes the vehicle 1 and its surroundings. The analysis unit 61 includes a self-position estimation unit 71 , a sensor fusion unit 72 and a recognition unit 73 .

The self-position estimation unit 71 estimates the self-position of the vehicle 1 based on the sensor data from the external recognition sensor 25 and the high-precision map accumulated in the map information accumulation unit 23.

The sensor fusion unit 72 combines a plurality of different types of sensor data (for example, image data supplied from the camera 51, LiDAR 53, and sensor data supplied from the radar 52) to perform sensor fusion processing to obtain new information. I do. Methods for combining different types of sensor data include integration, fusion, federation, and the like.

The recognition unit 73 executes a detection process for detecting the situation outside the vehicle 1 and a recognition process for recognizing the situation outside the vehicle 1 .

For example, the recognition unit 73 performs detection processing and recognition processing of the external situation of the vehicle 1 based on information from the external recognition sensor 25, information from the self-position estimation unit 71, information from the sensor fusion unit 72, and the like. .

Specifically, for example, the recognition unit 73 performs detection processing and recognition processing of objects around the vehicle 1 . Object detection processing is, for example, processing for detecting the presence or absence, size, shape, position, movement, and the like of an object. Object recognition processing is, for example, processing for recognizing an attribute such as the type of an object or identifying a specific object. However, detection processing and recognition processing are not always clearly separated, and may overlap.

For example, the recognition unit 73 detects objects around the vehicle 1 by clustering the point cloud based on sensor data from the radar 52 or the LiDAR 53 or the like for each cluster of point groups. As a result, presence/absence, size, shape, and position of objects around the vehicle 1 are detected.

For example, the recognition unit 73 detects the movement of objects around the vehicle 1 by performing tracking that follows the movement of the masses of point groups classified by clustering. As a result, the speed and traveling direction (movement vector) of the object around the vehicle 1 are detected.

For example, the recognition unit 73 detects or recognizes vehicles, people, bicycles, obstacles, structures, roads, traffic lights, traffic signs, road markings, etc. based on image data supplied from the camera 51 . Further, the recognition unit 73 may recognize types of objects around the vehicle 1 by performing recognition processing such as semantic segmentation.

The action plan section 62 creates an action plan for the vehicle 1. For example, the action planning unit 62 creates an action plan by performing route planning and route following processing.

It should be noted that global path planning is the process of planning a rough route from the start to the goal. This route planning is called trajectory planning, and in the planned route, trajectory generation (local path planning) that can proceed safely and smoothly in the vicinity of the vehicle 1 in consideration of the motion characteristics of the vehicle 1. It also includes the processing to be performed.

The motion control unit 63 controls the motion of the vehicle 1 in order to implement the action plan created by the action planning unit 62.

The DMS 30 performs driver authentication processing, driver state recognition processing, etc., based on sensor data from the in-vehicle sensor 26 and input data input to the HMI 31, which will be described later. The driver's state to be recognized includes, for example, physical condition, alertness, concentration, fatigue, gaze direction, drunkenness, driving operation, posture, and the like. The HMI 31 inputs various data, instructions, etc., and presents various data to the driver or the like.

The vehicle control unit 32 controls each unit of the vehicle 1. The vehicle control section 32 includes a steering control section 81 , a brake control section 82 , a drive control section 83 , a body system control section 84 , a light control section 85 and a horn control section 86 .

The steering control unit 81 detects and controls the state of the steering system of the vehicle 1 . The steering system includes, for example, a steering mechanism including a steering wheel, an electric power steering, and the like. The steering control unit 81 includes, for example, a steering ECU that controls the steering system, an actuator that drives the steering system, and the like.

The brake control unit 82 detects and controls the state of the brake system of the vehicle 1 . The brake system includes, for example, a brake mechanism including a brake pedal, an ABS (Antilock Brake System), a regenerative brake mechanism, and the like. The brake control unit 82 includes, for example, a brake ECU that controls the brake system, an actuator that drives the brake system, and the like.

The drive control unit 83 detects and controls the state of the drive system of the vehicle 1 . The drive system includes, for example, an accelerator pedal, a driving force generator for generating driving force such as an internal combustion engine or a driving motor, and a driving force transmission mechanism for transmitting the driving force to the wheels. The drive control unit 83 includes, for example, a drive ECU that controls the drive system, an actuator that drives the drive system, and the like.

The body system control unit 84 detects and controls the state of the body system of the vehicle 1 . The body system includes, for example, a keyless entry system, smart key system, power window device, power seat, air conditioner, air bag, seat belt, shift lever, and the like. The body system control unit 84 includes, for example, a body system ECU that controls the body system, an actuator that drives the body system, and the like.

The light control unit 85 detects and controls the states of various lights of the vehicle 1 . Lights to be controlled include, for example, headlights, backlights, fog lights, turn signals, brake lights, projections, bumper displays, and the like. The light control unit 85 includes a light ECU that controls the light, an actuator that drives the light, and the like.

The horn control unit 86 detects and controls the state of the car horn of the vehicle 1 . The horn control unit 86 includes, for example, a horn ECU for controlling the car horn, an actuator for driving the car horn, and the like.

[2. An example of an object detection model used by the recognition unit]
A general object detection model is an SSD (Single Shot MultiBox Detector). The SSD comprises a Convolutional Neural Network (CNN) that is machine-learned to detect objects from input images.

CNN machine learning uses teacher data to which images are given the types (classes) of objects contained in the images and the ground truth (GT) that indicates the area of the objects in the image. When generating training data, for example, the object is scanned multiple times (for example, 11 scans) by a general LiDAR (Light Detection And Ranging) equipped with 60 vertical lasers. Generate an image.

Then, add (synthesize) the generated multiple depth images, use the stereo image from the synthesized image, and remove the erroneous point cloud data to generate the ground truth. However, since CNN machine learning requires a huge amount of teacher data, the amount of processing required to generate ground truth increases.

[3. Learning method according to the present disclosure]
Therefore, as shown in FIG. 2, the information processing device included in the recognition unit 73 extracts a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1 acquired by the LiDAR 53 having 128 vertical lasers. and the image Pc corresponding to the three-dimensional point cloud data D1, the depth image Dm corresponding to the image Pc is generated.

Then, the information processing device uses point cloud data D3, which is left after a predetermined number of point cloud data D2 are thinned out from the three-dimensional point cloud data D1, as ground truth, so that the difference between the depth image Dm and the ground truth becomes small. Then, machine learning is performed by adjusting the coefficients of the CNN.

In this way, in the learning method according to the present disclosure, ground truth is generated only by generating point cloud data D3 in which a predetermined number of point cloud data D2 is thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and remains. (Point cloud data D3 remaining after being thinned out) can be generated. Therefore, according to the learning method according to the present disclosure, it is possible to greatly reduce the amount of processing required to generate ground truth compared to the general method of generating teacher data described above.

When thinning out a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1, the data amount (number of points) of the predetermined number of point cloud data D2 to be thinned is at least the data amount (number of points) of the three-dimensional point cloud data D1. number) less than 50%. As a result, it is possible to generate ground truth (thinned and remaining point cloud data D3) with a high degree of certainty that includes more feature amounts of the image Pc than the predetermined number of point cloud data D2, which is one of the inputs of the teacher data. can.

A method of thinning out a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1 includes, for example, a method of thinning out random data points from the three-dimensional point cloud data D1 arranged in a matrix, and a method of thinning out data points in one column for each sequence.

In addition, some images Pc are added with label data including data indicating the area of the subject in the image Pc and data indicating the type (class) of the subject. Therefore, when a label indicating the type of an object appearing in the image Pc and data indicating the area of the object in the image Pc are associated with the image, the information processing device can identify the area of the object for each type of object. Change the thinning rate of the point cloud data thinned from the three-dimensional point cloud data corresponding to .

As a result, the information processing apparatus does not uniformly thin out a predetermined number of point cloud data D2 from the entire area of the three-dimensional point cloud data D1, but rather selects the three-dimensional point cloud data according to the characteristics of the object and the purpose of object detection. An appropriate amount of predetermined number of point cloud data D2 can be thinned out for each region of D1.

For example, as shown in FIG. 3, the image Pc includes a vehicle Vc, a plurality of poles Po, and a background Bg. As shown in FIG. 4, the image Pc is added with label data LVc indicating that the area of the vehicle Vc and the object in the area are the vehicle Vc.

Further, the image Pc includes the area of the pole Po and label data LPo indicating that the object in the area is the pole Po, and the area of the background Bg and label data LBg indicating that the object in the area is the background Bg. is added.

In this case, the information processing apparatus sets the thinning rate in the area of the main object as the detection target lower than the thinning rate in the area of the non-main object as the detection target. In other words, the information processing apparatus makes the amount of point cloud data to be left in the area of the main object as the detection target larger than the amount of the point cloud data in the area of the object that is not the main detection target. Thereby, the information processing apparatus can generate ground truth with higher reliability for the region of the main object as the detection target.

For example, when the purpose of object detection is to detect another vehicle Vc while the own vehicle is running, the information processing device detects a predetermined number of 50% of the entire area from the three-dimensional point cloud data D1 for the area of the vehicle Vc. of the point cloud data D2 is thinned out, and the remaining 50% of the point cloud data (the point cloud data D3 remaining after the thinning out) is left as ground truth.

Further, when the pole Po is not a detection target, the information processing device thins out a predetermined number of point cloud data D2, which is 80% of the whole, from the three-dimensional point cloud data D1 for the area of the pole Po, and thins out the remaining 20% of the point cloud data D2. The data (point cloud data D3 remaining after thinning) is left as ground truth.

In this case, the amount of data in the point cloud data D3 remaining after thinning, which becomes the ground truth of the pole Po, is reduced, but the pole Po is an object that is difficult for the laser of the LiDAR 53 to hit. Therefore, the information processing apparatus mainly uses the data of the image Pc when detecting the pole Po. The information processing device executes the information processing program stored in the storage unit 28 to perform the above-described CNN machine learning and object detection processing.

[4. Processing executed by information processing device]
FIG. 5 is an explanatory diagram of processing executed by the information processing apparatus according to the present disclosure. The LiDAR shown in FIG. 5 is data obtained by converting the three-dimensional point cloud data D1 acquired from the LiDAR 53 into a vertical image. LiDAR' shown in FIG. 5 is a predetermined number of point cloud data D2 obtained by thinning out the point cloud data from the elevation image. Frames t−1, t, and t+1 shown in FIG. 5 are RGB images captured three times in succession in time series corresponding to the point cloud data D3 remaining after thinning out.

The camera parameter K shown in FIG. 5 is an internal parameter of the camera 51, and is a parameter used for converting from UV coordinates with the origin at the upper left of the image Pc to camera coordinates centered on the camera 51. Velocity shown in FIG. 5 is the speed of the vehicle in t frames obtained from the communication network 41 (CAN).

Depthencorder shown in FIG. 5 is a network that extracts features from 3D point cloud data. RGBencoder shown in FIG. 5 is a network for extracting features from RGB images. The Decorder shown in FIG. 5 is a network that transforms the extracted features into a DepthMap. Pose shown in FIG. 5 is a network for estimating the moving distance and direction of the own vehicle from time-series images.

When the three-dimensional point cloud data D1 (LiDAR shown in FIG. 5) is input from the LiDAR 53, the information processing device thins out the point cloud data from the three-dimensional point cloud data D1 (LiDAR shown in FIG. 5) (step S1), A predetermined number of point cloud data D2 (LiDAR' shown in FIG. 5) are generated.

Then, the information processing device extracts features from a predetermined number of point cloud data D2 (LiDAR' shown in FIG. 5) by Depthencorder (step S2). In addition, the information processing device extracts features from the t-frame image corresponding to the three-dimensional point cloud data D1 (LiDAR shown in FIG. 5) using the RGBencoder.

After that, the information processing device converts the features extracted by the Depthencorder and the RGBencorder into a DepthMap by the Decoder (step S3). Then, the information processing device calculates SmoothLoss that makes the depth map smooth (step S4). Further, the information processing device calculates DepthLoss, which is the difference between the DepthMap and the ground truth serving as a LiDAR teacher shown in FIG. 5 (step S5).

The information processing device calculates DepthLoss by, for example, Equation (1) below.

The information processing device performs machine learning by adjusting CNN parameters so that SmoothLoss and DepthLoss are minimized.

In addition, the information processing device estimates the moving distance and direction of the own vehicle from the time-series images of the t−1 frame, the t frame, and the t+1 frame using Pose (step S6). The output of Pose is represented by the following formula (2).

Further, the information processing device converts the estimated moving distance into speed (step S7), and calculates speed Loss (step S8). Velocity Loss is the difference in distance traveled from the distance estimated by Pose and the velocity. Speed Loss is calculated by the following formula (3).

Also, the information processing device generates an image of the previous t frames from the Pose output, the DepthMap, and the camera parameters (step S9). After that, the information processing device generates a mask for removing the same object based on the time-series images of t−1, t, and t+1 frames and the image generated in step S9 (step S10).

Subsequently, the information processing device uses a mask to remove the same object from the image generated in step S9 to generate a composite image. Then, the information processing device calculates an image Loss, which is the difference between the synthesized image and the true image (image of t frames). The information processing device performs machine learning by adjusting CNN parameters so that image loss is minimized.

[5. effect]
The learning method according to the embodiment is a learning method executed by a computer, and corresponds to a predetermined number of point cloud data D2 thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and the three-dimensional point cloud data D1. The depth image Dm corresponding to the image Pc is generated based on the image Pc, and the point cloud data D3 remaining after thinning out a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1 is used as the ground truth, and the depth image Dm Machine learning is performed by adjusting the coefficients of the convolutional neural network so that the difference between the ground truth and the ground truth becomes smaller. According to the learning method according to the embodiment, ground truth can be generated using only raw data acquired from the LiDAR 53, so the amount of processing required to generate ground truth can be greatly reduced.

Further, in the learning method according to the embodiment, when a label indicating the type of an object appearing in the image Pc and data indicating the area of the object in the image Pc are associated with the image, for each type of object, object The thinning rate of the point cloud data thinned out from the three-dimensional point cloud data D1 corresponding to the area of is changed. . According to the learning method according to the embodiment, instead of uniformly thinning out a predetermined number of point cloud data D2 from the entire area of the three-dimensional point cloud data D1, three-dimensional point An appropriate amount of a predetermined number of point cloud data D2 can be thinned out for each region of the group data D1.

In addition, the learning method according to the embodiment sets the thinning rate in areas of objects that are primary as detection targets to be lower than the thinning rate in areas of objects that are not primary as detection targets. According to the learning method according to the embodiment, it is possible to generate ground truth with higher reliability for regions of main objects as detection targets.

The learning program according to the embodiment corresponds to the image Pc based on a predetermined number of point cloud data D2 thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and the image Pc corresponding to the three-dimensional point cloud data D1. and the point cloud data D3 remaining after a predetermined number of point cloud data D2 are thinned out from the three-dimensional point cloud data D1 is used as the ground truth, and the difference between the depth image Dm and the ground truth is A computer is caused to execute a procedure of machine learning by adjusting the coefficients of the convolutional neural network so as to reduce the value. As a result, the computer can generate the ground truth using only the raw data obtained from the LiDAR 53, so the amount of processing required to generate the ground truth can be greatly reduced.

The information processing apparatus according to the embodiment creates an image Pc based on a predetermined number of point cloud data D2 thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and an image Pc corresponding to the three-dimensional point cloud data D1. A corresponding depth image Db is generated, and the point cloud data D3 remaining after a predetermined number of point cloud data D2 is thinned out from the three-dimensional point cloud data D1 is used as the ground truth, and the difference between the depth image Dm and the ground truth is small. The information processing unit adjusts the coefficients of the convolutional neural network so as to perform machine learning, and detects an object from the three-dimensional point cloud data and images input to the convolutional neural network. As a result, the information processing device can generate the ground truth using only the raw data acquired from the LiDAR 53, so that the amount of processing required for generating the ground truth can be greatly reduced.

The information processing method according to the embodiment is an information processing method executed by a computer. A depth image Dm corresponding to the image Pc is generated based on the corresponding image Pc, and the point cloud data D3 remaining after thinning out a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1 is used as ground truth, Machine learning is performed by adjusting the coefficients of the convolutional neural network so that the difference between the depth image Dm and the ground truth becomes small, and objects are detected from the three-dimensional point cloud data and images input to the convolutional neural network. As a result, the computer can generate the ground truth using only the raw data obtained from the LiDAR 53, so the amount of processing required to generate the ground truth can be greatly reduced.

The information processing program according to the embodiment creates an image Pc based on a predetermined number of point cloud data D2 thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and an image Pc corresponding to the three-dimensional point cloud data D1. A procedure for generating a corresponding depth image Dm, and a point cloud data D3 remaining after a predetermined number of point cloud data D2 are thinned out from the three-dimensional point cloud data D1 is used as the ground truth, and the difference between the depth image Dm and the ground truth. A computer is caused to execute a procedure for machine learning by adjusting the coefficients of the convolutional neural network so that . As a result, the computer can generate the ground truth using only the raw data obtained from the LiDAR 53, so the amount of processing required to generate the ground truth can be greatly reduced.

It should be noted that the effects described in this specification are only examples and are not limited, and other effects may also occur.

Note that the present technology can also take the following configuration.
(1)
A computer implemented learning method comprising:
A depth image corresponding to the image is generated based on a predetermined number of point cloud data thinned out from 3D point cloud data acquired by LiDAR (Light Detection And Ranging) and an image corresponding to the 3D point cloud data. death,
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. A learning method that involves adjusting and machine learning.
(2)
When a label indicating the type of an object appearing in the image and data indicating the area of the object in the image are associated with the image, for each type of the object, the label corresponding to the area of the object The learning method according to (1) above, wherein a thinning rate of point cloud data thinned from three-dimensional point cloud data is changed.
(3)
The learning method according to (2), wherein the thinning rate in the object area that is the main detection target is lower than the thinning rate in the object area that is not the main detection target.
(4)
A procedure of generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. A learning program that makes a computer perform a procedure for machine learning by adjusting .
(5)
generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. machine learning by adjusting
An information processing device comprising: an information processing unit that detects an object from three-dimensional point cloud data and images input to the convolutional neural network.
(6)
A computer-executed information processing method comprising:
generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. machine learning by adjusting
An information processing method comprising detecting an object from three-dimensional point cloud data and images input to the convolutional neural network.
(7)
A procedure of generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. and a procedure for machine learning by adjusting
An information processing program for causing a computer to execute a procedure for detecting an object from three-dimensional point cloud data and images input to the convolutional neural network.

Pc Image D1 Three-dimensional point cloud data D2 Predetermined number of point cloud data D3 Remaining point cloud data after thinning Dm Depth image

Claims

A computer implemented learning method comprising:
A depth image corresponding to the image is generated based on a predetermined number of point cloud data thinned out from 3D point cloud data acquired by LiDAR (Light Detection And Ranging) and an image corresponding to the 3D point cloud data. death,
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. A learning method that involves adjusting and machine learning.
When a label indicating the type of an object appearing in the image and data indicating the area of the object in the image are associated with the image, for each type of the object, the label corresponding to the area of the object 2. The learning method according to claim 1, wherein the thinning rate of the point cloud data thinned out from the three-dimensional point cloud data is changed.
3. The learning method according to claim 2, wherein the thinning rate in areas of the object that are primary as detection targets is set lower than the thinning rate in areas of the object that are not primary as detection targets.
A procedure of generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. A learning program that makes a computer perform a procedure for machine learning by adjusting .
generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. machine learning by adjusting
An information processing device comprising: an information processing unit that detects an object from three-dimensional point cloud data and images input to the convolutional neural network.
A computer-executed information processing method comprising:
generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. machine learning by adjusting
An information processing method comprising detecting an object from three-dimensional point cloud data and images input to the convolutional neural network.
A procedure of generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. and a procedure for machine learning by adjusting
An information processing program that causes a computer to execute a procedure for detecting an object from three-dimensional point cloud data and images input to the convolutional neural network.