WO2023085017A1 - Learning method, learning program, information processing device, information processing method, and information processing program - Google Patents

Learning method, learning program, information processing device, information processing method, and information processing program Download PDF

Info

Publication number
WO2023085017A1
WO2023085017A1 PCT/JP2022/038868 JP2022038868W WO2023085017A1 WO 2023085017 A1 WO2023085017 A1 WO 2023085017A1 JP 2022038868 W JP2022038868 W JP 2022038868W WO 2023085017 A1 WO2023085017 A1 WO 2023085017A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
cloud data
dimensional point
image
information processing
Prior art date
Application number
PCT/JP2022/038868
Other languages
French (fr)
Japanese (ja)
Inventor
周平 花澤
Original Assignee
ソニーセミコンダクタソリューションズ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーセミコンダクタソリューションズ株式会社 filed Critical ソニーセミコンダクタソリューションズ株式会社
Publication of WO2023085017A1 publication Critical patent/WO2023085017A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/89Lidar systems specially adapted for specific applications for mapping or imaging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present disclosure relates to a learning method, a learning program, an information processing device, an information processing method, and an information processing program.
  • CNNs Convolutional Neural Networks
  • a plurality of depth images generated from three-dimensional point cloud data scanned multiple times are added (synthesized), and a stereo image is used from the synthesized image to remove incorrect point cloud data. Generate stripped ground truth.
  • the present disclosure proposes a learning method, a learning program, an information processing device, an information processing method, and an information processing program that can reduce the amount of processing required to generate ground truth.
  • a learning method is a computer-executed learning method, and includes a predetermined number of point cloud data thinned out from three-dimensional point cloud data acquired by LiDAR (Light Detection And Ranging), and the three-dimensional point cloud generating a depth image corresponding to the image based on the image corresponding to the data; It includes performing machine learning by adjusting the coefficients of the convolutional neural network so that the difference between the depth image and the ground truth becomes small.
  • LiDAR Light Detection And Ranging
  • FIG. 1 is a block diagram showing a configuration example of a vehicle control system according to the present disclosure
  • FIG. FIG. 4 is an explanatory diagram of a learning method according to the present disclosure
  • FIG. 3 is a diagram illustrating an example of an image according to the present disclosure
  • FIG. FIG. 4 is an explanatory diagram of label data added to an image according to the present disclosure
  • FIG. 3 is an explanatory diagram of processing executed by an information processing apparatus according to the present disclosure
  • FIG. 1 is a block diagram showing a configuration example of a vehicle control system 11, which is an example of a mobile device control system to which the present technology is applied.
  • the vehicle control system 11 is provided in the vehicle 1 and performs processing related to driving support and automatic driving of the vehicle 1.
  • the vehicle control system 11 includes a vehicle control ECU (Electronic Control Unit) 21, a communication unit 22, a map information accumulation unit 23, a position information acquisition unit 24, an external recognition sensor 25, an in-vehicle sensor 26, a vehicle sensor 27, a storage unit 28, a driving It has a support/automatic driving control unit 29, a DMS (Driver Monitoring System) 30, an HMI (Human Machine Interface) 31, and a vehicle control unit 32.
  • vehicle control ECU Electronic Control Unit
  • communication unit 22 includes a communication unit 22, a map information accumulation unit 23, a position information acquisition unit 24, an external recognition sensor 25, an in-vehicle sensor 26, a vehicle sensor 27, a storage unit 28, a driving It has a support/automatic driving control unit 29, a DMS (Driver Monitoring System) 30, an HMI (Human Machine Interface) 31, and a vehicle control unit 32.
  • Vehicle control ECU 21, communication unit 22, map information storage unit 23, position information acquisition unit 24, external recognition sensor 25, in-vehicle sensor 26, vehicle sensor 27, storage unit 28, driving support/automatic driving control unit 29, driver monitoring system ( DMS) 30 , human machine interface (HMI) 31 , and vehicle control unit 32 are connected via a communication network 41 so as to be able to communicate with each other.
  • the communication network 41 is, for example, a CAN (Controller Area Network), LIN (Local Interconnect Network), LAN (Local Area Network), FlexRay (registered trademark), Ethernet (registered trademark), and other digital two-way communication standards. It is composed of a communication network, a bus, and the like.
  • the communication network 41 may be used properly depending on the type of data to be transmitted.
  • CAN may be applied to data related to vehicle control
  • Ethernet may be applied to large-capacity data.
  • each part of the vehicle control system 11 performs wireless communication assuming relatively short-range communication such as near field communication (NFC (Near Field Communication)) or Bluetooth (registered trademark) without going through the communication network 41. may be connected directly using NFC (Near Field Communication) or Bluetooth (registered trademark)
  • the communication unit 22 communicates with various devices inside and outside the vehicle, other vehicles, servers, base stations, etc., and transmits and receives various data.
  • the map information accumulation unit 23 accumulates one or both of the map obtained from the outside and the map created by the vehicle 1. For example, the map information accumulation unit 23 accumulates a three-dimensional high-precision map, a global map covering a wide area, and the like, which is lower in accuracy than the high-precision map.
  • the position information acquisition unit 24 receives GNSS signals from GNSS (Global Navigation Satellite System) satellites and acquires the position information of the vehicle 1 .
  • the acquired position information is supplied to the driving support/automatic driving control unit 29 .
  • the location information acquisition unit 24 is not limited to the method using GNSS signals, and may acquire location information using beacons, for example.
  • the external recognition sensor 25 includes various sensors used for recognizing situations outside the vehicle 1 and supplies sensor data from each sensor to each part of the vehicle control system 11 .
  • the type and number of sensors included in the external recognition sensor 25 are arbitrary.
  • the external recognition sensor 25 includes a camera 51, a radar 52, a LiDAR (Light Detection and Ranging, Laser Imaging Detection and Ranging) 53, and an ultrasonic sensor 54.
  • a camera 51 a radar 52
  • a LiDAR Light Detection and Ranging, Laser Imaging Detection and Ranging
  • an ultrasonic sensor 54 an ultrasonic sensor 54.
  • the in-vehicle sensor 26 includes various sensors for detecting information inside the vehicle, and supplies sensor data from each sensor to each part of the vehicle control system 11 .
  • the types and number of various sensors included in the in-vehicle sensor 26 are not particularly limited as long as they are the types and number that can be realistically installed in the vehicle 1 .
  • in-vehicle sensors 26 may comprise one or more of cameras, radar, seat sensors, steering wheel sensors, microphones, biometric sensors.
  • the vehicle sensor 27 includes various sensors for detecting the state of the vehicle 1, and supplies sensor data from each sensor to each section of the vehicle control system 11.
  • the types and number of various sensors included in the vehicle sensor 27 are not particularly limited as long as the types and number are practically installable in the vehicle 1 .
  • the vehicle sensor 27 includes a velocity sensor, an acceleration sensor, an angular velocity sensor (gyro sensor), and an inertial measurement unit (IMU (Inertial Measurement Unit)) integrating them.
  • IMU Inertial Measurement Unit
  • the storage unit 28 includes at least one of a nonvolatile storage medium and a volatile storage medium, and stores data and programs.
  • the storage unit 28 is used as, for example, EEPROM (Electrically Erasable Programmable Read Only Memory) and RAM (Random Access Memory), and storage media include magnetic storage devices such as HDD (Hard Disc Drive), semiconductor storage devices, optical storage devices, And a magneto-optical storage device can be applied.
  • the storage unit 28 stores various programs and data used by each unit of the vehicle control system 11 .
  • the driving support/automatic driving control unit 29 controls driving support and automatic driving of the vehicle 1 .
  • the driving support/automatic driving control unit 29 includes an analysis unit 61 , an action planning unit 62 and an operation control unit 63 .
  • the analysis unit 61 analyzes the vehicle 1 and its surroundings.
  • the analysis unit 61 includes a self-position estimation unit 71 , a sensor fusion unit 72 and a recognition unit 73 .
  • the self-position estimation unit 71 estimates the self-position of the vehicle 1 based on the sensor data from the external recognition sensor 25 and the high-precision map accumulated in the map information accumulation unit 23.
  • the sensor fusion unit 72 combines a plurality of different types of sensor data (for example, image data supplied from the camera 51, LiDAR 53, and sensor data supplied from the radar 52) to perform sensor fusion processing to obtain new information. I do. Methods for combining different types of sensor data include integration, fusion, federation, and the like.
  • the recognition unit 73 executes a detection process for detecting the situation outside the vehicle 1 and a recognition process for recognizing the situation outside the vehicle 1 .
  • the recognition unit 73 performs detection processing and recognition processing of the external situation of the vehicle 1 based on information from the external recognition sensor 25, information from the self-position estimation unit 71, information from the sensor fusion unit 72, and the like. .
  • the recognition unit 73 performs detection processing and recognition processing of objects around the vehicle 1 .
  • Object detection processing is, for example, processing for detecting the presence or absence, size, shape, position, movement, and the like of an object.
  • Object recognition processing is, for example, processing for recognizing an attribute such as the type of an object or identifying a specific object.
  • detection processing and recognition processing are not always clearly separated, and may overlap.
  • the recognition unit 73 detects objects around the vehicle 1 by clustering the point cloud based on sensor data from the radar 52 or the LiDAR 53 or the like for each cluster of point groups. As a result, presence/absence, size, shape, and position of objects around the vehicle 1 are detected.
  • the recognition unit 73 detects the movement of objects around the vehicle 1 by performing tracking that follows the movement of the masses of point groups classified by clustering. As a result, the speed and traveling direction (movement vector) of the object around the vehicle 1 are detected.
  • the recognition unit 73 detects or recognizes vehicles, people, bicycles, obstacles, structures, roads, traffic lights, traffic signs, road markings, etc. based on image data supplied from the camera 51 . Further, the recognition unit 73 may recognize types of objects around the vehicle 1 by performing recognition processing such as semantic segmentation.
  • the action plan section 62 creates an action plan for the vehicle 1.
  • the action planning unit 62 creates an action plan by performing route planning and route following processing.
  • global path planning is the process of planning a rough route from the start to the goal. This route planning is called trajectory planning, and in the planned route, trajectory generation (local path planning) that can proceed safely and smoothly in the vicinity of the vehicle 1 in consideration of the motion characteristics of the vehicle 1. It also includes the processing to be performed.
  • the motion control unit 63 controls the motion of the vehicle 1 in order to implement the action plan created by the action planning unit 62.
  • the DMS 30 performs driver authentication processing, driver state recognition processing, etc., based on sensor data from the in-vehicle sensor 26 and input data input to the HMI 31, which will be described later.
  • the driver's state to be recognized includes, for example, physical condition, alertness, concentration, fatigue, gaze direction, drunkenness, driving operation, posture, and the like.
  • the HMI 31 inputs various data, instructions, etc., and presents various data to the driver or the like.
  • the vehicle control unit 32 controls each unit of the vehicle 1.
  • the vehicle control section 32 includes a steering control section 81 , a brake control section 82 , a drive control section 83 , a body system control section 84 , a light control section 85 and a horn control section 86 .
  • the steering control unit 81 detects and controls the state of the steering system of the vehicle 1 .
  • the steering system includes, for example, a steering mechanism including a steering wheel, an electric power steering, and the like.
  • the steering control unit 81 includes, for example, a steering ECU that controls the steering system, an actuator that drives the steering system, and the like.
  • the brake control unit 82 detects and controls the state of the brake system of the vehicle 1 .
  • the brake system includes, for example, a brake mechanism including a brake pedal, an ABS (Antilock Brake System), a regenerative brake mechanism, and the like.
  • the brake control unit 82 includes, for example, a brake ECU that controls the brake system, an actuator that drives the brake system, and the like.
  • the drive control unit 83 detects and controls the state of the drive system of the vehicle 1 .
  • the drive system includes, for example, an accelerator pedal, a driving force generator for generating driving force such as an internal combustion engine or a driving motor, and a driving force transmission mechanism for transmitting the driving force to the wheels.
  • the drive control unit 83 includes, for example, a drive ECU that controls the drive system, an actuator that drives the drive system, and the like.
  • the body system control unit 84 detects and controls the state of the body system of the vehicle 1 .
  • the body system includes, for example, a keyless entry system, smart key system, power window device, power seat, air conditioner, air bag, seat belt, shift lever, and the like.
  • the body system control unit 84 includes, for example, a body system ECU that controls the body system, an actuator that drives the body system, and the like.
  • the light control unit 85 detects and controls the states of various lights of the vehicle 1 .
  • Lights to be controlled include, for example, headlights, backlights, fog lights, turn signals, brake lights, projections, bumper displays, and the like.
  • the light control unit 85 includes a light ECU that controls the light, an actuator that drives the light, and the like.
  • the horn control unit 86 detects and controls the state of the car horn of the vehicle 1 .
  • the horn control unit 86 includes, for example, a horn ECU for controlling the car horn, an actuator for driving the car horn, and the like.
  • a general object detection model is an SSD (Single Shot MultiBox Detector).
  • the SSD comprises a Convolutional Neural Network (CNN) that is machine-learned to detect objects from input images.
  • CNN Convolutional Neural Network
  • CNN machine learning uses teacher data to which images are given the types (classes) of objects contained in the images and the ground truth (GT) that indicates the area of the objects in the image.
  • GT ground truth
  • the object is scanned multiple times (for example, 11 scans) by a general LiDAR (Light Detection And Ranging) equipped with 60 vertical lasers. Generate an image.
  • LiDAR Light Detection And Ranging
  • the information processing device included in the recognition unit 73 extracts a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1 acquired by the LiDAR 53 having 128 vertical lasers. and the image Pc corresponding to the three-dimensional point cloud data D1, the depth image Dm corresponding to the image Pc is generated.
  • the information processing device uses point cloud data D3, which is left after a predetermined number of point cloud data D2 are thinned out from the three-dimensional point cloud data D1, as ground truth, so that the difference between the depth image Dm and the ground truth becomes small. Then, machine learning is performed by adjusting the coefficients of the CNN.
  • ground truth is generated only by generating point cloud data D3 in which a predetermined number of point cloud data D2 is thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and remains. (Point cloud data D3 remaining after being thinned out) can be generated. Therefore, according to the learning method according to the present disclosure, it is possible to greatly reduce the amount of processing required to generate ground truth compared to the general method of generating teacher data described above.
  • the data amount (number of points) of the predetermined number of point cloud data D2 to be thinned is at least the data amount (number of points) of the three-dimensional point cloud data D1. number) less than 50%.
  • a method of thinning out a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1 includes, for example, a method of thinning out random data points from the three-dimensional point cloud data D1 arranged in a matrix, and a method of thinning out data points in one column for each sequence.
  • some images Pc are added with label data including data indicating the area of the subject in the image Pc and data indicating the type (class) of the subject. Therefore, when a label indicating the type of an object appearing in the image Pc and data indicating the area of the object in the image Pc are associated with the image, the information processing device can identify the area of the object for each type of object. Change the thinning rate of the point cloud data thinned from the three-dimensional point cloud data corresponding to .
  • the information processing apparatus does not uniformly thin out a predetermined number of point cloud data D2 from the entire area of the three-dimensional point cloud data D1, but rather selects the three-dimensional point cloud data according to the characteristics of the object and the purpose of object detection.
  • An appropriate amount of predetermined number of point cloud data D2 can be thinned out for each region of D1.
  • the image Pc includes a vehicle Vc, a plurality of poles Po, and a background Bg.
  • the image Pc is added with label data LVc indicating that the area of the vehicle Vc and the object in the area are the vehicle Vc.
  • the image Pc includes the area of the pole Po and label data LPo indicating that the object in the area is the pole Po, and the area of the background Bg and label data LBg indicating that the object in the area is the background Bg. is added.
  • the information processing apparatus sets the thinning rate in the area of the main object as the detection target lower than the thinning rate in the area of the non-main object as the detection target. In other words, the information processing apparatus makes the amount of point cloud data to be left in the area of the main object as the detection target larger than the amount of the point cloud data in the area of the object that is not the main detection target. Thereby, the information processing apparatus can generate ground truth with higher reliability for the region of the main object as the detection target.
  • the information processing device detects a predetermined number of 50% of the entire area from the three-dimensional point cloud data D1 for the area of the vehicle Vc. of the point cloud data D2 is thinned out, and the remaining 50% of the point cloud data (the point cloud data D3 remaining after the thinning out) is left as ground truth.
  • the information processing device thins out a predetermined number of point cloud data D2, which is 80% of the whole, from the three-dimensional point cloud data D1 for the area of the pole Po, and thins out the remaining 20% of the point cloud data D2.
  • the data point cloud data D3 remaining after thinning is left as ground truth.
  • the information processing apparatus mainly uses the data of the image Pc when detecting the pole Po.
  • the information processing device executes the information processing program stored in the storage unit 28 to perform the above-described CNN machine learning and object detection processing.
  • FIG. 5 is an explanatory diagram of processing executed by the information processing apparatus according to the present disclosure.
  • the LiDAR shown in FIG. 5 is data obtained by converting the three-dimensional point cloud data D1 acquired from the LiDAR 53 into a vertical image.
  • LiDAR' shown in FIG. 5 is a predetermined number of point cloud data D2 obtained by thinning out the point cloud data from the elevation image.
  • Frames t ⁇ 1, t, and t+1 shown in FIG. 5 are RGB images captured three times in succession in time series corresponding to the point cloud data D3 remaining after thinning out.
  • the camera parameter K shown in FIG. 5 is an internal parameter of the camera 51, and is a parameter used for converting from UV coordinates with the origin at the upper left of the image Pc to camera coordinates centered on the camera 51.
  • Velocity shown in FIG. 5 is the speed of the vehicle in t frames obtained from the communication network 41 (CAN).
  • Depthencorder shown in FIG. 5 is a network that extracts features from 3D point cloud data.
  • RGBencoder shown in FIG. 5 is a network for extracting features from RGB images.
  • the Decorder shown in FIG. 5 is a network that transforms the extracted features into a DepthMap.
  • Pose shown in FIG. 5 is a network for estimating the moving distance and direction of the own vehicle from time-series images.
  • the information processing device thins out the point cloud data from the three-dimensional point cloud data D1 (LiDAR shown in FIG. 5) (step S1), A predetermined number of point cloud data D2 (LiDAR' shown in FIG. 5) are generated.
  • the information processing device extracts features from a predetermined number of point cloud data D2 (LiDAR' shown in FIG. 5) by Depthencorder (step S2).
  • the information processing device extracts features from the t-frame image corresponding to the three-dimensional point cloud data D1 (LiDAR shown in FIG. 5) using the RGBencoder.
  • the information processing device converts the features extracted by the Depthencorder and the RGBencorder into a DepthMap by the Decoder (step S3). Then, the information processing device calculates SmoothLoss that makes the depth map smooth (step S4). Further, the information processing device calculates DepthLoss, which is the difference between the DepthMap and the ground truth serving as a LiDAR teacher shown in FIG. 5 (step S5).
  • the information processing device calculates DepthLoss by, for example, Equation (1) below.
  • the information processing device performs machine learning by adjusting CNN parameters so that SmoothLoss and DepthLoss are minimized.
  • the information processing device estimates the moving distance and direction of the own vehicle from the time-series images of the t ⁇ 1 frame, the t frame, and the t+1 frame using Pose (step S6).
  • the output of Pose is represented by the following formula (2).
  • the information processing device converts the estimated moving distance into speed (step S7), and calculates speed Loss (step S8).
  • Velocity Loss is the difference in distance traveled from the distance estimated by Pose and the velocity.
  • Speed Loss is calculated by the following formula (3).
  • the information processing device generates an image of the previous t frames from the Pose output, the DepthMap, and the camera parameters (step S9). After that, the information processing device generates a mask for removing the same object based on the time-series images of t ⁇ 1, t, and t+1 frames and the image generated in step S9 (step S10).
  • the information processing device uses a mask to remove the same object from the image generated in step S9 to generate a composite image. Then, the information processing device calculates an image Loss, which is the difference between the synthesized image and the true image (image of t frames). The information processing device performs machine learning by adjusting CNN parameters so that image loss is minimized.
  • the learning method according to the embodiment is a learning method executed by a computer, and corresponds to a predetermined number of point cloud data D2 thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and the three-dimensional point cloud data D1.
  • the depth image Dm corresponding to the image Pc is generated based on the image Pc, and the point cloud data D3 remaining after thinning out a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1 is used as the ground truth, and the depth image Dm Machine learning is performed by adjusting the coefficients of the convolutional neural network so that the difference between the ground truth and the ground truth becomes smaller.
  • ground truth can be generated using only raw data acquired from the LiDAR 53, so the amount of processing required to generate ground truth can be greatly reduced.
  • the learning method when a label indicating the type of an object appearing in the image Pc and data indicating the area of the object in the image Pc are associated with the image, for each type of object, object The thinning rate of the point cloud data thinned out from the three-dimensional point cloud data D1 corresponding to the area of is changed.
  • object The thinning rate of the point cloud data thinned out from the three-dimensional point cloud data D1 corresponding to the area of is changed.
  • the learning method instead of uniformly thinning out a predetermined number of point cloud data D2 from the entire area of the three-dimensional point cloud data D1, three-dimensional point An appropriate amount of a predetermined number of point cloud data D2 can be thinned out for each region of the group data D1.
  • the learning method according to the embodiment sets the thinning rate in areas of objects that are primary as detection targets to be lower than the thinning rate in areas of objects that are not primary as detection targets. According to the learning method according to the embodiment, it is possible to generate ground truth with higher reliability for regions of main objects as detection targets.
  • the learning program according to the embodiment corresponds to the image Pc based on a predetermined number of point cloud data D2 thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and the image Pc corresponding to the three-dimensional point cloud data D1. and the point cloud data D3 remaining after a predetermined number of point cloud data D2 are thinned out from the three-dimensional point cloud data D1 is used as the ground truth, and the difference between the depth image Dm and the ground truth is A computer is caused to execute a procedure of machine learning by adjusting the coefficients of the convolutional neural network so as to reduce the value. As a result, the computer can generate the ground truth using only the raw data obtained from the LiDAR 53, so the amount of processing required to generate the ground truth can be greatly reduced.
  • the information processing apparatus creates an image Pc based on a predetermined number of point cloud data D2 thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and an image Pc corresponding to the three-dimensional point cloud data D1.
  • a corresponding depth image Db is generated, and the point cloud data D3 remaining after a predetermined number of point cloud data D2 is thinned out from the three-dimensional point cloud data D1 is used as the ground truth, and the difference between the depth image Dm and the ground truth is small.
  • the information processing unit adjusts the coefficients of the convolutional neural network so as to perform machine learning, and detects an object from the three-dimensional point cloud data and images input to the convolutional neural network.
  • the information processing device can generate the ground truth using only the raw data acquired from the LiDAR 53, so that the amount of processing required for generating the ground truth can be greatly reduced.
  • the information processing method is an information processing method executed by a computer.
  • a depth image Dm corresponding to the image Pc is generated based on the corresponding image Pc, and the point cloud data D3 remaining after thinning out a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1 is used as ground truth,
  • Machine learning is performed by adjusting the coefficients of the convolutional neural network so that the difference between the depth image Dm and the ground truth becomes small, and objects are detected from the three-dimensional point cloud data and images input to the convolutional neural network.
  • the computer can generate the ground truth using only the raw data obtained from the LiDAR 53, so the amount of processing required to generate the ground truth can be greatly reduced.
  • the information processing program creates an image Pc based on a predetermined number of point cloud data D2 thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and an image Pc corresponding to the three-dimensional point cloud data D1.
  • a procedure for generating a corresponding depth image Dm, and a point cloud data D3 remaining after a predetermined number of point cloud data D2 are thinned out from the three-dimensional point cloud data D1 is used as the ground truth, and the difference between the depth image Dm and the ground truth.
  • a computer is caused to execute a procedure for machine learning by adjusting the coefficients of the convolutional neural network so that . As a result, the computer can generate the ground truth using only the raw data obtained from the LiDAR 53, so the amount of processing required to generate the ground truth can be greatly reduced.
  • a computer implemented learning method comprising: A depth image corresponding to the image is generated based on a predetermined number of point cloud data thinned out from 3D point cloud data acquired by LiDAR (Light Detection And Ranging) and an image corresponding to the 3D point cloud data. death, The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth.
  • a learning method that involves adjusting and machine learning.
  • the point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth.
  • a learning program that makes a computer perform a procedure for machine learning by adjusting .
  • An information processing device comprising: an information processing unit that detects an object from three-dimensional point cloud data and images input to the convolutional neural network.
  • a computer-executed information processing method comprising: generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data; The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth.
  • machine learning by adjusting An information processing method comprising detecting an object from three-dimensional point cloud data and images input to the convolutional neural network.
  • the point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth.
  • a procedure for machine learning by adjusting An information processing program for causing a computer to execute a procedure for detecting an object from three-dimensional point cloud data and images input to the convolutional neural network.

Abstract

A learning method according to the present disclosure is executed by a computer. The learning method includes: generating, on the basis of a predetermined number of point cloud data items (D2) thinned out from three-dimensional point cloud data (D1) acquired by LiDAR and an image (Pc) corresponding to the three-dimensional point cloud data (D1), a depth image (Dm) corresponding to the image (Pc); and performing machine learning by adjusting a coefficient of a convolutional neural network such that, when point cloud data (D3) remaining after the predetermined number of point cloud data items (D2) is thinned out from the three-dimensional point cloud data (D1) is defined as a ground truth, a difference between the depth image (Dm) and the ground truth becomes small.

Description

学習方法、学習プログラム、情報処理装置、情報処理方法、および情報処理プログラムLearning method, learning program, information processing device, information processing method, and information processing program
 本開示は、学習方法、学習プログラム、情報処理装置、情報処理方法、および情報処理プログラムに関する。 The present disclosure relates to a learning method, a learning program, an information processing device, an information processing method, and an information processing program.
 LiDAR(Light Detection And Ranging)によって取得される3次元点群データから深度画像を生成し、深度画像を用いて畳み込みニューラルネットワーク(CNN:Convolutional Neural Network)の機械学習に使用する教師データを生成する技術がある(例えば、特許文献1参照)。 A technology that generates depth images from 3D point cloud data acquired by LiDAR (Light Detection And Ranging), and uses the depth images to generate teacher data used for machine learning of Convolutional Neural Networks (CNNs). There is (for example, see Patent Document 1).
 教師データを生成する場合には、例えば、複数回スキャンした3次元点群データから生成した複数の深度画像を加算(合成)し、合成画像からステレオ画像を使用して、間違いの点群データを除去したグランドトゥルースを生成する。 When generating teacher data, for example, a plurality of depth images generated from three-dimensional point cloud data scanned multiple times are added (synthesized), and a stereo image is used from the synthesized image to remove incorrect point cloud data. Generate stripped ground truth.
特開2021-68138号公報Japanese Patent Application Laid-Open No. 2021-68138
 しかしながら、CNNの機械学習には、膨大な数の教師データが必要であるため、上記の従来技術では、グランドトゥルースの生成に要する処理量が嵩む。 However, since CNN machine learning requires a huge amount of teacher data, the above conventional technology requires a large amount of processing to generate ground truth.
 そこで、本開示では、グランドトゥルースの生成に要する処理量を低減できる学習方法、学習プログラム、情報処理装置、情報処理方法、および情報処理プログラムを提案する。 Therefore, the present disclosure proposes a learning method, a learning program, an information processing device, an information processing method, and an information processing program that can reduce the amount of processing required to generate ground truth.
 本開示に係る学習方法は、コンピュータが実行する学習方法であって、LiDAR(Light Detection And Ranging)によって取得される3次元点群データから間引いた所定数の点群データと、前記3次元点群データに対応する画像とに基づいて前記画像に対応する深度画像を生成し、前記3次元点群データから前記所定数の点群データが間引かれて残った点群データをグランドトゥルースとして、前記深度画像と前記グランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習することを含む。 A learning method according to the present disclosure is a computer-executed learning method, and includes a predetermined number of point cloud data thinned out from three-dimensional point cloud data acquired by LiDAR (Light Detection And Ranging), and the three-dimensional point cloud generating a depth image corresponding to the image based on the image corresponding to the data; It includes performing machine learning by adjusting the coefficients of the convolutional neural network so that the difference between the depth image and the ground truth becomes small.
本開示に係る車両制御システムの構成例を示すブロック図である。1 is a block diagram showing a configuration example of a vehicle control system according to the present disclosure; FIG. 本開示に係る学習方法の説明図である。FIG. 4 is an explanatory diagram of a learning method according to the present disclosure; 本開示に係る画像の一例を示す図である。FIG. 3 is a diagram illustrating an example of an image according to the present disclosure; FIG. 本開示に係る画像に付加されるラベルデータの説明図である。FIG. 4 is an explanatory diagram of label data added to an image according to the present disclosure; 本開示に係る情報処理装置が実行する処理の説明図である。FIG. 3 is an explanatory diagram of processing executed by an information processing apparatus according to the present disclosure;
 以下に、本開示の実施形態について図面に基づいて詳細に説明する。なお、以下の各実施形態において、同一の部位には同一の符号を付することにより重複する説明を省略する。 Below, embodiments of the present disclosure will be described in detail based on the drawings. In addition, in each of the following embodiments, the same parts are denoted by the same reference numerals, thereby omitting redundant explanations.
 [1.車両制御システムの構成例]
 図1は、本技術が適用される移動装置制御システムの一例である車両制御システム11の構成例を示すブロック図である。
[1. Configuration example of vehicle control system]
FIG. 1 is a block diagram showing a configuration example of a vehicle control system 11, which is an example of a mobile device control system to which the present technology is applied.
 車両制御システム11は、車両1に設けられ、車両1の走行支援及び自動運転に関わる処理を行う。 The vehicle control system 11 is provided in the vehicle 1 and performs processing related to driving support and automatic driving of the vehicle 1.
 車両制御システム11は、車両制御ECU(Electronic Control Unit)21、通信部22、地図情報蓄積部23、位置情報取得部24、外部認識センサ25、車内センサ26、車両センサ27、記憶部28、走行支援・自動運転制御部29、DMS(Driver Monitoring System)30、HMI(Human Machine Interface)31、及び、車両制御部32を備える。 The vehicle control system 11 includes a vehicle control ECU (Electronic Control Unit) 21, a communication unit 22, a map information accumulation unit 23, a position information acquisition unit 24, an external recognition sensor 25, an in-vehicle sensor 26, a vehicle sensor 27, a storage unit 28, a driving It has a support/automatic driving control unit 29, a DMS (Driver Monitoring System) 30, an HMI (Human Machine Interface) 31, and a vehicle control unit 32.
 車両制御ECU21、通信部22、地図情報蓄積部23、位置情報取得部24、外部認識センサ25、車内センサ26、車両センサ27、記憶部28、走行支援・自動運転制御部29、ドライバモニタリングシステム(DMS)30、ヒューマンマシーンインタフェース(HMI)31、及び、車両制御部32は、通信ネットワーク41を介して相互に通信可能に接続されている。通信ネットワーク41は、例えば、CAN(Controller Area Network)、LIN(Local Interconnect Network)、LAN(Local Area Network)、FlexRay(登録商標)、イーサネット(登録商標)といったディジタル双方向通信の規格に準拠した車載通信ネットワークやバス等により構成される。通信ネットワーク41は、伝送されるデータの種類によって使い分けられてもよい。例えば、車両制御に関するデータに対してCANが適用され、大容量データに対してイーサネットが適用されるようにしてもよい。なお、車両制御システム11の各部は、通信ネットワーク41を介さずに、例えば近距離無線通信(NFC(Near Field Communication))やBluetooth(登録商標)といった比較的近距離での通信を想定した無線通信を用いて直接的に接続される場合もある。 Vehicle control ECU 21, communication unit 22, map information storage unit 23, position information acquisition unit 24, external recognition sensor 25, in-vehicle sensor 26, vehicle sensor 27, storage unit 28, driving support/automatic driving control unit 29, driver monitoring system ( DMS) 30 , human machine interface (HMI) 31 , and vehicle control unit 32 are connected via a communication network 41 so as to be able to communicate with each other. The communication network 41 is, for example, a CAN (Controller Area Network), LIN (Local Interconnect Network), LAN (Local Area Network), FlexRay (registered trademark), Ethernet (registered trademark), and other digital two-way communication standards. It is composed of a communication network, a bus, and the like. The communication network 41 may be used properly depending on the type of data to be transmitted. For example, CAN may be applied to data related to vehicle control, and Ethernet may be applied to large-capacity data. In addition, each part of the vehicle control system 11 performs wireless communication assuming relatively short-range communication such as near field communication (NFC (Near Field Communication)) or Bluetooth (registered trademark) without going through the communication network 41. may be connected directly using
 通信部22は、車内及び車外の様々な機器、他の車両、サーバ、基地局等と通信を行い、各種のデータの送受信を行う。 The communication unit 22 communicates with various devices inside and outside the vehicle, other vehicles, servers, base stations, etc., and transmits and receives various data.
 地図情報蓄積部23は、外部から取得した地図及び車両1で作成した地図の一方又は両方を蓄積する。例えば、地図情報蓄積部23は、3次元の高精度地図、高精度地図より精度が低く、広いエリアをカバーするグローバルマップ等を蓄積する。 The map information accumulation unit 23 accumulates one or both of the map obtained from the outside and the map created by the vehicle 1. For example, the map information accumulation unit 23 accumulates a three-dimensional high-precision map, a global map covering a wide area, and the like, which is lower in accuracy than the high-precision map.
 位置情報取得部24は、GNSS(Global Navigation Satellite System)衛星からGNSS信号を受信し、車両1の位置情報を取得する。取得した位置情報は、走行支援・自動運転制御部29に供給される。なお、位置情報取得部24は、GNSS信号を用いた方式に限定されず、例えば、ビーコンを用いて位置情報を取得してもよい。 The position information acquisition unit 24 receives GNSS signals from GNSS (Global Navigation Satellite System) satellites and acquires the position information of the vehicle 1 . The acquired position information is supplied to the driving support/automatic driving control unit 29 . Note that the location information acquisition unit 24 is not limited to the method using GNSS signals, and may acquire location information using beacons, for example.
 外部認識センサ25は、車両1の外部の状況の認識に用いられる各種のセンサを備え、各センサからのセンサデータを車両制御システム11の各部に供給する。外部認識センサ25が備えるセンサの種類や数は任意である。 The external recognition sensor 25 includes various sensors used for recognizing situations outside the vehicle 1 and supplies sensor data from each sensor to each part of the vehicle control system 11 . The type and number of sensors included in the external recognition sensor 25 are arbitrary.
 例えば、外部認識センサ25は、カメラ51、レーダ52、LiDAR(Light Detection and Ranging、Laser Imaging Detection and Ranging)53、及び、超音波センサ54を備える。 For example, the external recognition sensor 25 includes a camera 51, a radar 52, a LiDAR (Light Detection and Ranging, Laser Imaging Detection and Ranging) 53, and an ultrasonic sensor 54.
 車内センサ26は、車内の情報を検出するための各種のセンサを備え、各センサからのセンサデータを車両制御システム11の各部に供給する。車内センサ26が備える各種センサの種類や数は、現実的に車両1に設置可能な種類や数であれば特に限定されない。例えば、車内センサ26は、カメラ、レーダ、着座センサ、ステアリングホイールセンサ、マイクロフォン、生体センサのうち1種類以上のセンサを備えることができる。 The in-vehicle sensor 26 includes various sensors for detecting information inside the vehicle, and supplies sensor data from each sensor to each part of the vehicle control system 11 . The types and number of various sensors included in the in-vehicle sensor 26 are not particularly limited as long as they are the types and number that can be realistically installed in the vehicle 1 . For example, in-vehicle sensors 26 may comprise one or more of cameras, radar, seat sensors, steering wheel sensors, microphones, biometric sensors.
 車両センサ27は、車両1の状態を検出するための各種のセンサを備え、各センサからのセンサデータを車両制御システム11の各部に供給する。車両センサ27が備える各種センサの種類や数は、現実的に車両1に設置可能な種類や数であれば特に限定されない。例えば、車両センサ27は、速度センサ、加速度センサ、角速度センサ(ジャイロセンサ)、及び、それらを統合した慣性計測装置(IMU(Inertial Measurement Unit))を備える。 The vehicle sensor 27 includes various sensors for detecting the state of the vehicle 1, and supplies sensor data from each sensor to each section of the vehicle control system 11. The types and number of various sensors included in the vehicle sensor 27 are not particularly limited as long as the types and number are practically installable in the vehicle 1 . For example, the vehicle sensor 27 includes a velocity sensor, an acceleration sensor, an angular velocity sensor (gyro sensor), and an inertial measurement unit (IMU (Inertial Measurement Unit)) integrating them.
 記憶部28は、不揮発性の記憶媒体及び揮発性の記憶媒体のうち少なくとも一方を含み、データやプログラムを記憶する。記憶部28は、例えばEEPROM(Electrically Erasable Programmable Read Only Memory)及びRAM(Random Access Memory)として用いられ、記憶媒体としては、HDD(Hard Disc Drive)といった磁気記憶デバイス、半導体記憶デバイス、光記憶デバイス、及び、光磁気記憶デバイスを適用することができる。記憶部28は、車両制御システム11の各部が用いる各種プログラムやデータを記憶する。 The storage unit 28 includes at least one of a nonvolatile storage medium and a volatile storage medium, and stores data and programs. The storage unit 28 is used as, for example, EEPROM (Electrically Erasable Programmable Read Only Memory) and RAM (Random Access Memory), and storage media include magnetic storage devices such as HDD (Hard Disc Drive), semiconductor storage devices, optical storage devices, And a magneto-optical storage device can be applied. The storage unit 28 stores various programs and data used by each unit of the vehicle control system 11 .
 走行支援・自動運転制御部29は、車両1の走行支援及び自動運転の制御を行う。例えば、走行支援・自動運転制御部29は、分析部61、行動計画部62、及び、動作制御部63を備える。 The driving support/automatic driving control unit 29 controls driving support and automatic driving of the vehicle 1 . For example, the driving support/automatic driving control unit 29 includes an analysis unit 61 , an action planning unit 62 and an operation control unit 63 .
 分析部61は、車両1及び周囲の状況の分析処理を行う。分析部61は、自己位置推定部71、センサフュージョン部72、及び、認識部73を備える。 The analysis unit 61 analyzes the vehicle 1 and its surroundings. The analysis unit 61 includes a self-position estimation unit 71 , a sensor fusion unit 72 and a recognition unit 73 .
 自己位置推定部71は、外部認識センサ25からのセンサデータ、及び、地図情報蓄積部23に蓄積されている高精度地図に基づいて、車両1の自己位置を推定する。 The self-position estimation unit 71 estimates the self-position of the vehicle 1 based on the sensor data from the external recognition sensor 25 and the high-precision map accumulated in the map information accumulation unit 23.
 センサフュージョン部72は、複数の異なる種類のセンサデータ(例えば、カメラ51から供給される画像データ、LiDAR53、及び、レーダ52から供給されるセンサデータ)を組み合わせて、新たな情報を得るセンサフュージョン処理を行う。異なる種類のセンサデータを組合せる方法としては、統合、融合、連合等がある。 The sensor fusion unit 72 combines a plurality of different types of sensor data (for example, image data supplied from the camera 51, LiDAR 53, and sensor data supplied from the radar 52) to perform sensor fusion processing to obtain new information. I do. Methods for combining different types of sensor data include integration, fusion, federation, and the like.
 認識部73は、車両1の外部の状況の検出を行う検出処理、及び、車両1の外部の状況の認識を行う認識処理を実行する。 The recognition unit 73 executes a detection process for detecting the situation outside the vehicle 1 and a recognition process for recognizing the situation outside the vehicle 1 .
 例えば、認識部73は、外部認識センサ25からの情報、自己位置推定部71からの情報、センサフュージョン部72からの情報等に基づいて、車両1の外部の状況の検出処理及び認識処理を行う。 For example, the recognition unit 73 performs detection processing and recognition processing of the external situation of the vehicle 1 based on information from the external recognition sensor 25, information from the self-position estimation unit 71, information from the sensor fusion unit 72, and the like. .
 具体的には、例えば、認識部73は、車両1の周囲の物体の検出処理及び認識処理等を行う。物体の検出処理とは、例えば、物体の有無、大きさ、形、位置、動き等を検出する処理である。物体の認識処理とは、例えば、物体の種類等の属性を認識したり、特定の物体を識別したりする処理である。ただし、検出処理と認識処理とは、必ずしも明確に分かれるものではなく、重複する場合がある。 Specifically, for example, the recognition unit 73 performs detection processing and recognition processing of objects around the vehicle 1 . Object detection processing is, for example, processing for detecting the presence or absence, size, shape, position, movement, and the like of an object. Object recognition processing is, for example, processing for recognizing an attribute such as the type of an object or identifying a specific object. However, detection processing and recognition processing are not always clearly separated, and may overlap.
 例えば、認識部73は、レーダ52又はLiDAR53等によるセンサデータに基づくポイントクラウドを点群の塊毎に分類するクラスタリングを行うことにより、車両1の周囲の物体を検出する。これにより、車両1の周囲の物体の有無、大きさ、形状、位置が検出される。 For example, the recognition unit 73 detects objects around the vehicle 1 by clustering the point cloud based on sensor data from the radar 52 or the LiDAR 53 or the like for each cluster of point groups. As a result, presence/absence, size, shape, and position of objects around the vehicle 1 are detected.
 例えば、認識部73は、クラスタリングにより分類された点群の塊の動きを追従するトラッキングを行うことにより、車両1の周囲の物体の動きを検出する。これにより、車両1の周囲の物体の速度及び進行方向(移動ベクトル)が検出される。 For example, the recognition unit 73 detects the movement of objects around the vehicle 1 by performing tracking that follows the movement of the masses of point groups classified by clustering. As a result, the speed and traveling direction (movement vector) of the object around the vehicle 1 are detected.
 例えば、認識部73は、カメラ51から供給される画像データに基づいて、車両、人、自転車、障害物、構造物、道路、信号機、交通標識、道路標示等を検出又は認識する。また、認識部73は、セマンティックセグメンテーション等の認識処理を行うことにより、車両1の周囲の物体の種類を認識してもよい。 For example, the recognition unit 73 detects or recognizes vehicles, people, bicycles, obstacles, structures, roads, traffic lights, traffic signs, road markings, etc. based on image data supplied from the camera 51 . Further, the recognition unit 73 may recognize types of objects around the vehicle 1 by performing recognition processing such as semantic segmentation.
 行動計画部62は、車両1の行動計画を作成する。例えば、行動計画部62は、経路計画、経路追従の処理を行うことにより、行動計画を作成する。 The action plan section 62 creates an action plan for the vehicle 1. For example, the action planning unit 62 creates an action plan by performing route planning and route following processing.
 なお、経路計画(Global path planning)とは、スタートからゴールまでの大まかな経路を計画する処理である。この経路計画には、軌道計画と言われ、計画した経路において、車両1の運動特性を考慮して、車両1の近傍で安全かつ滑らかに進行することが可能な軌道生成(Local path planning)を行う処理も含まれる。 It should be noted that global path planning is the process of planning a rough route from the start to the goal. This route planning is called trajectory planning, and in the planned route, trajectory generation (local path planning) that can proceed safely and smoothly in the vicinity of the vehicle 1 in consideration of the motion characteristics of the vehicle 1. It also includes the processing to be performed.
 動作制御部63は、行動計画部62により作成された行動計画を実現するために、車両1の動作を制御する。 The motion control unit 63 controls the motion of the vehicle 1 in order to implement the action plan created by the action planning unit 62.
 DMS30は、車内センサ26からのセンサデータ、及び、後述するHMI31に入力される入力データ等に基づいて、運転者の認証処理、及び、運転者の状態の認識処理等を行う。認識対象となる運転者の状態としては、例えば、体調、覚醒度、集中度、疲労度、視線方向、酩酊度、運転操作、姿勢等が想定される。HMI31は、各種のデータや指示等の入力と、各種のデータの運転者等への提示を行う。 The DMS 30 performs driver authentication processing, driver state recognition processing, etc., based on sensor data from the in-vehicle sensor 26 and input data input to the HMI 31, which will be described later. The driver's state to be recognized includes, for example, physical condition, alertness, concentration, fatigue, gaze direction, drunkenness, driving operation, posture, and the like. The HMI 31 inputs various data, instructions, etc., and presents various data to the driver or the like.
 車両制御部32は、車両1の各部の制御を行う。車両制御部32は、ステアリング制御部81、ブレーキ制御部82、駆動制御部83、ボディ系制御部84、ライト制御部85、及び、ホーン制御部86を備える。 The vehicle control unit 32 controls each unit of the vehicle 1. The vehicle control section 32 includes a steering control section 81 , a brake control section 82 , a drive control section 83 , a body system control section 84 , a light control section 85 and a horn control section 86 .
 ステアリング制御部81は、車両1のステアリングシステムの状態の検出及び制御等を行う。ステアリングシステムは、例えば、ステアリングホイール等を備えるステアリング機構、電動パワーステアリング等を備える。ステアリング制御部81は、例えば、ステアリングシステムの制御を行うステアリングECU、ステアリングシステムの駆動を行うアクチュエータ等を備える。 The steering control unit 81 detects and controls the state of the steering system of the vehicle 1 . The steering system includes, for example, a steering mechanism including a steering wheel, an electric power steering, and the like. The steering control unit 81 includes, for example, a steering ECU that controls the steering system, an actuator that drives the steering system, and the like.
 ブレーキ制御部82は、車両1のブレーキシステムの状態の検出及び制御等を行う。ブレーキシステムは、例えば、ブレーキペダル等を含むブレーキ機構、ABS(Antilock Brake System)、回生ブレーキ機構等を備える。ブレーキ制御部82は、例えば、ブレーキシステムの制御を行うブレーキECU、ブレーキシステムの駆動を行うアクチュエータ等を備える。 The brake control unit 82 detects and controls the state of the brake system of the vehicle 1 . The brake system includes, for example, a brake mechanism including a brake pedal, an ABS (Antilock Brake System), a regenerative brake mechanism, and the like. The brake control unit 82 includes, for example, a brake ECU that controls the brake system, an actuator that drives the brake system, and the like.
 駆動制御部83は、車両1の駆動システムの状態の検出及び制御等を行う。駆動システムは、例えば、アクセルペダル、内燃機関又は駆動用モータ等の駆動力を発生させるための駆動力発生装置、駆動力を車輪に伝達するための駆動力伝達機構等を備える。駆動制御部83は、例えば、駆動システムの制御を行う駆動ECU、駆動システムの駆動を行うアクチュエータ等を備える。 The drive control unit 83 detects and controls the state of the drive system of the vehicle 1 . The drive system includes, for example, an accelerator pedal, a driving force generator for generating driving force such as an internal combustion engine or a driving motor, and a driving force transmission mechanism for transmitting the driving force to the wheels. The drive control unit 83 includes, for example, a drive ECU that controls the drive system, an actuator that drives the drive system, and the like.
 ボディ系制御部84は、車両1のボディ系システムの状態の検出及び制御等を行う。ボディ系システムは、例えば、キーレスエントリシステム、スマートキーシステム、パワーウインドウ装置、パワーシート、空調装置、エアバッグ、シートベルト、シフトレバー等を備える。ボディ系制御部84は、例えば、ボディ系システムの制御を行うボディ系ECU、ボディ系システムの駆動を行うアクチュエータ等を備える。 The body system control unit 84 detects and controls the state of the body system of the vehicle 1 . The body system includes, for example, a keyless entry system, smart key system, power window device, power seat, air conditioner, air bag, seat belt, shift lever, and the like. The body system control unit 84 includes, for example, a body system ECU that controls the body system, an actuator that drives the body system, and the like.
 ライト制御部85は、車両1の各種のライトの状態の検出及び制御等を行う。制御対象となるライトとしては、例えば、ヘッドライト、バックライト、フォグライト、ターンシグナル、ブレーキライト、プロジェクション、バンパーの表示等が想定される。ライト制御部85は、ライトの制御を行うライトECU、ライトの駆動を行うアクチュエータ等を備える。 The light control unit 85 detects and controls the states of various lights of the vehicle 1 . Lights to be controlled include, for example, headlights, backlights, fog lights, turn signals, brake lights, projections, bumper displays, and the like. The light control unit 85 includes a light ECU that controls the light, an actuator that drives the light, and the like.
 ホーン制御部86は、車両1のカーホーンの状態の検出及び制御等を行う。ホーン制御部86は、例えば、カーホーンの制御を行うホーンECU、カーホーンの駆動を行うアクチュエータ等を備える。 The horn control unit 86 detects and controls the state of the car horn of the vehicle 1 . The horn control unit 86 includes, for example, a horn ECU for controlling the car horn, an actuator for driving the car horn, and the like.
[2.認識部が使用する物体検出モデルの一例]
 一般的な物体検出モデルとして、SSD(Single Shot MultiBox Detector)がある。SSDは、入力画像から物体を検出するように機械学習された畳み込みニューラルネットワーク(CNN:Convolutional Neural Network)を備える。
[2. An example of an object detection model used by the recognition unit]
A general object detection model is an SSD (Single Shot MultiBox Detector). The SSD comprises a Convolutional Neural Network (CNN) that is machine-learned to detect objects from input images.
 CNNの機械学習には、画像に対して、画像に含まれる物体の種類(クラス)と画像における物体の領域を示すグランドトゥルース(GT)とが付与された教師データが使用される。教師データを生成する場合には、例えば、縦に60個のレーザを備えた一般的なLiDAR(Light Detection And Ranging)によって物体を複数回スキャン(例えば、11スキャン)した3次元点群データから深度画像を生成する。 CNN machine learning uses teacher data to which images are given the types (classes) of objects contained in the images and the ground truth (GT) that indicates the area of the objects in the image. When generating training data, for example, the object is scanned multiple times (for example, 11 scans) by a general LiDAR (Light Detection And Ranging) equipped with 60 vertical lasers. Generate an image.
 そして、生成した複数の深度画像を加算(合成)し、合成画像からステレオ画像を使用して、間違いの点群データを除去しすることによってグランドトゥルースを生成する。しかしながら、CNNの機械学習には、膨大な数の教師データが必要であるため、グランドトゥルースの生成に要する処理量が嵩む。 Then, add (synthesize) the generated multiple depth images, use the stereo image from the synthesized image, and remove the erroneous point cloud data to generate the ground truth. However, since CNN machine learning requires a huge amount of teacher data, the amount of processing required to generate ground truth increases.
[3.本開示に係る学習方法]
 そこで、図2に示すように、認識部73に含まれる情報処理装置は、縦に128個のレーザを備えたLiDAR53によって取得される3次元点群データD1から間引いた所定数の点群データD2と、3次元点群データD1に対応する画像Pcとに基づいて画像Pcに対応する深度画像Dmを生成する。
[3. Learning method according to the present disclosure]
Therefore, as shown in FIG. 2, the information processing device included in the recognition unit 73 extracts a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1 acquired by the LiDAR 53 having 128 vertical lasers. and the image Pc corresponding to the three-dimensional point cloud data D1, the depth image Dm corresponding to the image Pc is generated.
 そして、情報処理装置は、3次元点群データD1から所定数の点群データD2が間引かれて残った点群データD3をグランドトゥルースとして、深度画像Dmとグランドトゥルースとの差分が小さくなるように、CNNの係数を調整して機械学習する。 Then, the information processing device uses point cloud data D3, which is left after a predetermined number of point cloud data D2 are thinned out from the three-dimensional point cloud data D1, as ground truth, so that the difference between the depth image Dm and the ground truth becomes small. Then, machine learning is performed by adjusting the coefficients of the CNN.
 このように、本開示に係る学習方法では、LiDAR53によって取得される3次元点群データD1から所定数の点群データD2が間引かれて残った点群データD3を生成するだけで、グランドトゥルース(間引かれて残った点群データD3)を生成することができる。したがって、本開示に係る学習方法によれば、上記した一般的な教師データの生成方法に比べて、グランドトゥルースの生成に要する処理量を大幅に低減できる。 In this way, in the learning method according to the present disclosure, ground truth is generated only by generating point cloud data D3 in which a predetermined number of point cloud data D2 is thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and remains. (Point cloud data D3 remaining after being thinned out) can be generated. Therefore, according to the learning method according to the present disclosure, it is possible to greatly reduce the amount of processing required to generate ground truth compared to the general method of generating teacher data described above.
 3次元点群データD1から所定数の点群データD2を間引く場合には、間引く所定数の点群データD2のデータ量(点の数)を少なくとも3次元点群データD1のデータ量(点の数)50%未満とする。これにより、教師データの入力の一つとなる所定数の点群データD2よりも画像Pcの特徴量を多く含む確度の高いグランドトゥルース(間引かれて残った点群データD3)を生成することができる。 When thinning out a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1, the data amount (number of points) of the predetermined number of point cloud data D2 to be thinned is at least the data amount (number of points) of the three-dimensional point cloud data D1. number) less than 50%. As a result, it is possible to generate ground truth (thinned and remaining point cloud data D3) with a high degree of certainty that includes more feature amounts of the image Pc than the predetermined number of point cloud data D2, which is one of the inputs of the teacher data. can.
 3次元点群データD1から所定数の点群データD2を間引く方法としては、例えば、行列状に配置された3次元点群データD1からランダムなデータの点を間引く方法、数行毎に1行のデータの点を間引く方法、および数列毎に1列のデータの点を間引く方法などがある。 A method of thinning out a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1 includes, for example, a method of thinning out random data points from the three-dimensional point cloud data D1 arranged in a matrix, and a method of thinning out data points in one column for each sequence.
 また、画像Pcには、画像Pcにおける被写体の領域を示すデータと、被写体の種類(クラス)を示すデータとを含むラベルデータが付加されているものがある。このため、情報処理装置は、画像Pcに写る物体の種類を示すラベルと、画像Pcにおける物体の領域を示すデータとが画像に対応付けられている場合に、物体の種類毎に、物体の領域に対応する3次元点群データから間引く点群データの間引き率を変更する。 In addition, some images Pc are added with label data including data indicating the area of the subject in the image Pc and data indicating the type (class) of the subject. Therefore, when a label indicating the type of an object appearing in the image Pc and data indicating the area of the object in the image Pc are associated with the image, the information processing device can identify the area of the object for each type of object. Change the thinning rate of the point cloud data thinned from the three-dimensional point cloud data corresponding to .
 これにより、情報処理装置は、3次元点群データD1の全領域から一律に所定数の点群データD2を間引くのではなく、被写体の特性や物体検出の用途に応じて、3次元点群データD1の領域毎に適切な量の所定数の点群データD2を間引くことができる。 As a result, the information processing apparatus does not uniformly thin out a predetermined number of point cloud data D2 from the entire area of the three-dimensional point cloud data D1, but rather selects the three-dimensional point cloud data according to the characteristics of the object and the purpose of object detection. An appropriate amount of predetermined number of point cloud data D2 can be thinned out for each region of D1.
 例えば、図3に示すように、画像Pcには、車両Vcと、複数本のポールPoと、背景Bgとが写っている。そして、画像Pcには、図4に示すように、車両Vcの領域と領域内の物体が車両Vcであることを示すラベルデータLVcが付加されている。 For example, as shown in FIG. 3, the image Pc includes a vehicle Vc, a plurality of poles Po, and a background Bg. As shown in FIG. 4, the image Pc is added with label data LVc indicating that the area of the vehicle Vc and the object in the area are the vehicle Vc.
 さらに、画像Pcには、ポールPoの領域と領域内の物体がポールPoであることを示すラベルデータLPoと、背景Bgの領域と領域内の物体が背景Bgであることを示すラベルデータLBgとが付加されている。 Further, the image Pc includes the area of the pole Po and label data LPo indicating that the object in the area is the pole Po, and the area of the background Bg and label data LBg indicating that the object in the area is the background Bg. is added.
 この場合、情報処理装置は、検出対象として主要な物体の領域における間引き率を、検出対象として主要でない物体の領域における間引き率よりも低くする。換言すれば、情報処理装置は、検出対象として主要な物体の領域に残す点群データの量を、検出対象として主要でない物体の領域における点群データの量よりも多くする。これにより、情報処理装置は、検出対象として主要な物体の領域に対してより信頼性の高いグランドトゥルースを生成することができる。 In this case, the information processing apparatus sets the thinning rate in the area of the main object as the detection target lower than the thinning rate in the area of the non-main object as the detection target. In other words, the information processing apparatus makes the amount of point cloud data to be left in the area of the main object as the detection target larger than the amount of the point cloud data in the area of the object that is not the main detection target. Thereby, the information processing apparatus can generate ground truth with higher reliability for the region of the main object as the detection target.
 例えば、情報処理装置は、物体検出の用途が自車両走行中に他の車両Vcを検出することである場合、車両Vcの領域については、3次元点群データD1から全体の50%の所定数の点群データD2を間引き、残り50%の点群データ(間引かれて残った点群データD3)を残してグランドトゥルースにする。 For example, when the purpose of object detection is to detect another vehicle Vc while the own vehicle is running, the information processing device detects a predetermined number of 50% of the entire area from the three-dimensional point cloud data D1 for the area of the vehicle Vc. of the point cloud data D2 is thinned out, and the remaining 50% of the point cloud data (the point cloud data D3 remaining after the thinning out) is left as ground truth.
 また、情報処理装置は、ポールPoが検出対象でない場合、ポールPoの領域については、3次元点群データD1から全体の80%の所定数の点群データD2を間引き、残り20%の点群データ(間引かれて残った点群データD3)を残してグランドトゥルースにする。 Further, when the pole Po is not a detection target, the information processing device thins out a predetermined number of point cloud data D2, which is 80% of the whole, from the three-dimensional point cloud data D1 for the area of the pole Po, and thins out the remaining 20% of the point cloud data D2. The data (point cloud data D3 remaining after thinning) is left as ground truth.
 この場合、ポールPoのグランドトゥルースとなる間引かれて残った点群データD3のデータ量が少なくなるが、ポールPoは、LiDAR53のレーザが当たり難い物体である。このため、情報処理装置は、ポールPoの検出を行う場合には、主に画像Pcのデータを使用する。情報処理装置は、記憶部28に記憶された情報処理プログラムを実行することによって、上記したCNNの機械学習および物体検出処理を行う。 In this case, the amount of data in the point cloud data D3 remaining after thinning, which becomes the ground truth of the pole Po, is reduced, but the pole Po is an object that is difficult for the laser of the LiDAR 53 to hit. Therefore, the information processing apparatus mainly uses the data of the image Pc when detecting the pole Po. The information processing device executes the information processing program stored in the storage unit 28 to perform the above-described CNN machine learning and object detection processing.
[4.情報処理装置が実行する処理]
 図5は、本開示に係る情報処理装置が実行する処理の説明図である。図5に示すLiDARは、LiDAR53から取得する3次元点群データD1を立面画像にしたデータである。図5に示すLiDAR´は、立面画像から点群データを間引いた所定数の点群データD2である。図5に示すt-1フレーム、tフレーム、t+1フレームは、間引かれて残った点群データD3に対応する時系列に3連続で撮像されたRGB画像である。
[4. Processing executed by information processing device]
FIG. 5 is an explanatory diagram of processing executed by the information processing apparatus according to the present disclosure. The LiDAR shown in FIG. 5 is data obtained by converting the three-dimensional point cloud data D1 acquired from the LiDAR 53 into a vertical image. LiDAR' shown in FIG. 5 is a predetermined number of point cloud data D2 obtained by thinning out the point cloud data from the elevation image. Frames t−1, t, and t+1 shown in FIG. 5 are RGB images captured three times in succession in time series corresponding to the point cloud data D3 remaining after thinning out.
 図5に示すカメラパラメータKは、カメラ51の内部パラメータであり、画像Pcの左上を原点とするUV座標から、カメラ51を中心としたカメラ座標に変換するために使用されるパラメータである。図5に示すVelocityは、通信ネットワーク41(CAN)から取得したtフレームでの車両の速度である。 The camera parameter K shown in FIG. 5 is an internal parameter of the camera 51, and is a parameter used for converting from UV coordinates with the origin at the upper left of the image Pc to camera coordinates centered on the camera 51. Velocity shown in FIG. 5 is the speed of the vehicle in t frames obtained from the communication network 41 (CAN).
 図5に示すDepthencorderは、3次元点群データから特徴を抽出するネットワークである。図5に示すRGBencorderは、RGB画像から特徴を抽出するネットワークである。図5に示すDecorderは、抽出された特徴をDepthMapに変換するネットワークである。図5に示すPoseは、時系列な画像から自車両の移動距離と向きを推定するネットワークである。 Depthencorder shown in FIG. 5 is a network that extracts features from 3D point cloud data. RGBencoder shown in FIG. 5 is a network for extracting features from RGB images. The Decorder shown in FIG. 5 is a network that transforms the extracted features into a DepthMap. Pose shown in FIG. 5 is a network for estimating the moving distance and direction of the own vehicle from time-series images.
 情報処理装置は、LiDAR53から3次元点群データD1(図5に示すLiDAR)が入力されると、3次元点群データD1(図5に示すLiDAR)から点群データを間引き(ステップS1)、所定数の点群データD2(図5に示すLiDAR´)を生成する。 When the three-dimensional point cloud data D1 (LiDAR shown in FIG. 5) is input from the LiDAR 53, the information processing device thins out the point cloud data from the three-dimensional point cloud data D1 (LiDAR shown in FIG. 5) (step S1), A predetermined number of point cloud data D2 (LiDAR' shown in FIG. 5) are generated.
 そして、情報処理装置は、Depthencorderによって、所定数の点群データD2(図5に示すLiDAR´)から特徴を抽出する(ステップS2)する。また、情報処理装置は、RGBencorderによって、3次元点群データD1(図5に示すLiDAR)に対応するtフレームの画像から特徴を抽出する。 Then, the information processing device extracts features from a predetermined number of point cloud data D2 (LiDAR' shown in FIG. 5) by Depthencorder (step S2). In addition, the information processing device extracts features from the t-frame image corresponding to the three-dimensional point cloud data D1 (LiDAR shown in FIG. 5) using the RGBencoder.
 その後、情報処理装置は、DepthencorderおよびRGBencorderによって抽出された特徴をDecorderによってDepthMapに変換する(ステップS3)。そして、情報処理装置は、DepthMapが滑らかになるようなSmoothLossを算出する(ステップS4)。さらに、情報処理装置は、DepthMapと、図5に示すLiDARの教師となるグランドトゥルースとの差分であるDepthLossを算出する(ステップS5)。 After that, the information processing device converts the features extracted by the Depthencorder and the RGBencorder into a DepthMap by the Decoder (step S3). Then, the information processing device calculates SmoothLoss that makes the depth map smooth (step S4). Further, the information processing device calculates DepthLoss, which is the difference between the DepthMap and the ground truth serving as a LiDAR teacher shown in FIG. 5 (step S5).
 情報処理装置は、例えば、下記式(1)によってDepthLossを算出する。
Figure JPOXMLDOC01-appb-M000001
The information processing device calculates DepthLoss by, for example, Equation (1) below.
Figure JPOXMLDOC01-appb-M000001
 情報処理装置は、SmoothLossおよびDepthLossが最小となるように、CNNのパラメータを調整して機械学習する。 The information processing device performs machine learning by adjusting CNN parameters so that SmoothLoss and DepthLoss are minimized.
 また、情報処理装置は、Poseによって、t-1フレーム、tフレーム、およびt+1フレームの時系列な画像から自車両の移動距離と向きを推定する(ステップS6)。Poseの出力は、下記式(2)によって表される。
Figure JPOXMLDOC01-appb-M000002
In addition, the information processing device estimates the moving distance and direction of the own vehicle from the time-series images of the t−1 frame, the t frame, and the t+1 frame using Pose (step S6). The output of Pose is represented by the following formula (2).
Figure JPOXMLDOC01-appb-M000002
 さらに、情報処理装置は、推定した移動距離から速度に変換し(ステップS7)、速度Lossを算出する(ステップS8)。速度Lossは、Poseによって推定された移動距離と速度とから警官する移動距離の差分である。速度Lossは、下記式(3)によって算出する。
Figure JPOXMLDOC01-appb-M000003
Further, the information processing device converts the estimated moving distance into speed (step S7), and calculates speed Loss (step S8). Velocity Loss is the difference in distance traveled from the distance estimated by Pose and the velocity. Speed Loss is calculated by the following formula (3).
Figure JPOXMLDOC01-appb-M000003
 また、情報処理装置は、Poseの出力と、DepthMapと、カメラパラメータとから前のtフレームの画像を生成する(ステップS9)。その後、情報処理装置は、t-1フレーム、tフレーム、およびt+1フレームの時系列な画像と、ステップS9によって生成した画像とに基づいて、同物体を除去するマスクを生成する(ステップS10)。 Also, the information processing device generates an image of the previous t frames from the Pose output, the DepthMap, and the camera parameters (step S9). After that, the information processing device generates a mask for removing the same object based on the time-series images of t−1, t, and t+1 frames and the image generated in step S9 (step S10).
 続いて、情報処理装置は、ステップS9によって生成した画像からマスクを使用して同物体を除去して合成画像を生成する。そして、情報処理装置は、合成画像と真の画像(tフレームの画像)との差分となる画像Lossを算出する。情報処理装置は、画像Lossが最小となるようにCNNのパラメータを調整して機械学習する。 Subsequently, the information processing device uses a mask to remove the same object from the image generated in step S9 to generate a composite image. Then, the information processing device calculates an image Loss, which is the difference between the synthesized image and the true image (image of t frames). The information processing device performs machine learning by adjusting CNN parameters so that image loss is minimized.
[5.効果]
 実施形態に係る学習方法は、コンピュータが実行する学習方法であって、LiDAR53によって取得される3次元点群データD1から間引いた所定数の点群データD2と、3次元点群データD1に対応する画像Pcとに基づいて画像に対応する深度画像Dmを生成し、3次元点群データD1から所定数の点群データD2が間引かれて残った点群データD3をグランドトゥルースとして、深度画像Dmとグランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習する。実施形態に係る学習方法によれば、LiDAR53から取得するロウデータのみを使用してグランドトゥルースを生成できるので、グランドトゥルースの生成に要する処理量を大幅に低減できる。
[5. effect]
The learning method according to the embodiment is a learning method executed by a computer, and corresponds to a predetermined number of point cloud data D2 thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and the three-dimensional point cloud data D1. The depth image Dm corresponding to the image Pc is generated based on the image Pc, and the point cloud data D3 remaining after thinning out a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1 is used as the ground truth, and the depth image Dm Machine learning is performed by adjusting the coefficients of the convolutional neural network so that the difference between the ground truth and the ground truth becomes smaller. According to the learning method according to the embodiment, ground truth can be generated using only raw data acquired from the LiDAR 53, so the amount of processing required to generate ground truth can be greatly reduced.
 また、実施形態に係る学習方法は、画像Pcに写る物体の種類を示すラベルと、画像Pcにおける物体の領域を示すデータとが画像に対応付けられている場合に、物体の種類毎に、物体の領域に対応する3次元点群データD1から間引く点群データの間引き率を変更する。。実施形態に係る学習方法によれば、3次元点群データD1の全領域から一律に所定数の点群データD2を間引くのではなく、被写体の特性や物体検出の用途に応じて、3次元点群データD1の領域毎に適切な量の所定数の点群データD2を間引くことができる。 Further, in the learning method according to the embodiment, when a label indicating the type of an object appearing in the image Pc and data indicating the area of the object in the image Pc are associated with the image, for each type of object, object The thinning rate of the point cloud data thinned out from the three-dimensional point cloud data D1 corresponding to the area of is changed. . According to the learning method according to the embodiment, instead of uniformly thinning out a predetermined number of point cloud data D2 from the entire area of the three-dimensional point cloud data D1, three-dimensional point An appropriate amount of a predetermined number of point cloud data D2 can be thinned out for each region of the group data D1.
 また、実施形態に係る学習方法は、検出対象として主要な物体の領域における間引き率を、検出対象として主要でない物体の領域における間引き率よりも低くする。実施形態に係る学習方法によれば、検出対象として主要な物体の領域に対してより信頼性の高いグランドトゥルースを生成することができる。 In addition, the learning method according to the embodiment sets the thinning rate in areas of objects that are primary as detection targets to be lower than the thinning rate in areas of objects that are not primary as detection targets. According to the learning method according to the embodiment, it is possible to generate ground truth with higher reliability for regions of main objects as detection targets.
 実施形態に係る学習プログラムは、LiDAR53によって取得される3次元点群データD1から間引いた所定数の点群データD2と、3次元点群データD1に対応する画像Pcとに基づいて画像Pcに対応する深度画像Dmを生成する手順と、3次元点群データD1から所定数の点群データD2が間引かれて残った点群データD3をグランドトゥルースとして、深度画像Dmとグランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習する手順とをコンピュータに実行させる。これにより、コンピュータは、LiDAR53から取得するロウデータのみを使用してグランドトゥルースを生成できるので、グランドトゥルースの生成に要する処理量を大幅に低減できる。 The learning program according to the embodiment corresponds to the image Pc based on a predetermined number of point cloud data D2 thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and the image Pc corresponding to the three-dimensional point cloud data D1. and the point cloud data D3 remaining after a predetermined number of point cloud data D2 are thinned out from the three-dimensional point cloud data D1 is used as the ground truth, and the difference between the depth image Dm and the ground truth is A computer is caused to execute a procedure of machine learning by adjusting the coefficients of the convolutional neural network so as to reduce the value. As a result, the computer can generate the ground truth using only the raw data obtained from the LiDAR 53, so the amount of processing required to generate the ground truth can be greatly reduced.
 実施形態に係る情報処理装置は、LiDAR53によって取得される3次元点群データD1から間引いた所定数の点群データD2と、3次元点群データD1に対応する画像Pcとに基づいて画像Pcに対応する深度画像Dbを生成し、3次元点群データD1から所定数の点群データD2が間引かれて残った点群データD3をグランドトゥルースとして、深度画像Dmとグランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習し、畳み込みニューラルネットワークに入力される3次元点群データおよび画像から物体を検出する情報処理部を備える。これにより、情報処理装置は、LiDAR53から取得するロウデータのみを使用してグランドトゥルースを生成できるので、グランドトゥルースの生成に要する処理量を大幅に低減できる。 The information processing apparatus according to the embodiment creates an image Pc based on a predetermined number of point cloud data D2 thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and an image Pc corresponding to the three-dimensional point cloud data D1. A corresponding depth image Db is generated, and the point cloud data D3 remaining after a predetermined number of point cloud data D2 is thinned out from the three-dimensional point cloud data D1 is used as the ground truth, and the difference between the depth image Dm and the ground truth is small. The information processing unit adjusts the coefficients of the convolutional neural network so as to perform machine learning, and detects an object from the three-dimensional point cloud data and images input to the convolutional neural network. As a result, the information processing device can generate the ground truth using only the raw data acquired from the LiDAR 53, so that the amount of processing required for generating the ground truth can be greatly reduced.
 実施形態に係る情報処理方法は、コンピュータが実行する情報処理方法であって、LiDAR53によって取得される3次元点群データD1から間引いた所定数の点群データD2と、3次元点群データD1に対応する画像Pcとに基づいて画像Pcに対応する深度画像Dmを生成し、3次元点群データD1から所定数の点群データD2が間引かれて残った点群データD3をグランドトゥルースとして、深度画像Dmとグランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習し、畳み込みニューラルネットワークに入力される3次元点群データおよび画像から物体を検出する。これにより、コンピュータは、LiDAR53から取得するロウデータのみを使用してグランドトゥルースを生成できるので、グランドトゥルースの生成に要する処理量を大幅に低減できる。 The information processing method according to the embodiment is an information processing method executed by a computer. A depth image Dm corresponding to the image Pc is generated based on the corresponding image Pc, and the point cloud data D3 remaining after thinning out a predetermined number of point cloud data D2 from the three-dimensional point cloud data D1 is used as ground truth, Machine learning is performed by adjusting the coefficients of the convolutional neural network so that the difference between the depth image Dm and the ground truth becomes small, and objects are detected from the three-dimensional point cloud data and images input to the convolutional neural network. As a result, the computer can generate the ground truth using only the raw data obtained from the LiDAR 53, so the amount of processing required to generate the ground truth can be greatly reduced.
 実施形態に係る情報処理プログラムは、LiDAR53によって取得される3次元点群データD1から間引いた所定数の点群データD2と、3次元点群データD1に対応する画像Pcとに基づいて画像Pcに対応する深度画像Dmを生成する手順と、3次元点群データD1から所定数の点群データD2が間引かれて残った点群データD3をグランドトゥルースとして、深度画像Dmとグランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習する手順と、畳み込みニューラルネットワークに入力される3次元点群データおよび画像から物体を検出する手順とをコンピュータに実行させる。これにより、コンピュータは、LiDAR53から取得するロウデータのみを使用してグランドトゥルースを生成できるので、グランドトゥルースの生成に要する処理量を大幅に低減できる。 The information processing program according to the embodiment creates an image Pc based on a predetermined number of point cloud data D2 thinned out from the three-dimensional point cloud data D1 acquired by the LiDAR 53 and an image Pc corresponding to the three-dimensional point cloud data D1. A procedure for generating a corresponding depth image Dm, and a point cloud data D3 remaining after a predetermined number of point cloud data D2 are thinned out from the three-dimensional point cloud data D1 is used as the ground truth, and the difference between the depth image Dm and the ground truth. A computer is caused to execute a procedure for machine learning by adjusting the coefficients of the convolutional neural network so that . As a result, the computer can generate the ground truth using only the raw data obtained from the LiDAR 53, so the amount of processing required to generate the ground truth can be greatly reduced.
 なお、本明細書に記載された効果はあくまで例示であって限定されるものでは無く、また他の効果があってもよい。 It should be noted that the effects described in this specification are only examples and are not limited, and other effects may also occur.
 なお、本技術は以下のような構成も取ることができる。
(1)
 コンピュータが実行する学習方法であって、
 LiDAR(Light Detection And Ranging)によって取得される3次元点群データから間引いた所定数の点群データと、前記3次元点群データに対応する画像とに基づいて前記画像に対応する深度画像を生成し、
 前記3次元点群データから前記所定数の点群データが間引かれて残った点群データをグランドトゥルースとして、前記深度画像と前記グランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習する
 ことを含む学習方法。
(2)
 前記画像に写る物体の種類を示すラベルと、前記画像における前記物体の領域を示すデータとが前記画像に対応付けられている場合に、前記物体の種類毎に、前記物体の領域に対応する前記3次元点群データから間引く点群データの間引き率を変更する
 前記(1)に記載の学習方法。
(3)
 検出対象として主要な前記物体の領域における前記間引き率を、検出対象として主要でない前記物体の領域における前記間引き率よりも低くする
 前記(2)に記載の学習方法。
(4)
 LiDARによって取得される3次元点群データから間引いた所定数の点群データと、前記3次元点群データに対応する画像とに基づいて前記画像に対応する深度画像を生成する手順と、
 前記3次元点群データから前記所定数の点群データが間引かれて残った点群データをグランドトゥルースとして、前記深度画像と前記グランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習する手順と
 をコンピュータに実行させる学習プログラム。
(5)
 LiDARによって取得される3次元点群データから間引いた所定数の点群データと、前記3次元点群データに対応する画像とに基づいて前記画像に対応する深度画像を生成し、
 前記3次元点群データから前記所定数の点群データが間引かれて残った点群データをグランドトゥルースとして、前記深度画像と前記グランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習し、
 前記畳み込みニューラルネットワークに入力される3次元点群データおよび画像から物体を検出する情報処理部
 を備える情報処理装置。
(6)
 コンピュータが実行する情報処理方法であって、
 LiDARによって取得される3次元点群データから間引いた所定数の点群データと、前記3次元点群データに対応する画像とに基づいて前記画像に対応する深度画像を生成し、
 前記3次元点群データから前記所定数の点群データが間引かれて残った点群データをグランドトゥルースとして、前記深度画像と前記グランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習し、
 前記畳み込みニューラルネットワークに入力される3次元点群データおよび画像から物体を検出する
 ことを含む情報処理方法。
(7)
 LiDARによって取得される3次元点群データから間引いた所定数の点群データと、前記3次元点群データに対応する画像とに基づいて前記画像に対応する深度画像を生成する手順と、
 前記3次元点群データから前記所定数の点群データが間引かれて残った点群データをグランドトゥルースとして、前記深度画像と前記グランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習する手順と、
 前記畳み込みニューラルネットワークに入力される3次元点群データおよび画像から物体を検出する手順と
 をコンピュータに実行させる情報処理プログラム。
Note that the present technology can also take the following configuration.
(1)
A computer implemented learning method comprising:
A depth image corresponding to the image is generated based on a predetermined number of point cloud data thinned out from 3D point cloud data acquired by LiDAR (Light Detection And Ranging) and an image corresponding to the 3D point cloud data. death,
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. A learning method that involves adjusting and machine learning.
(2)
When a label indicating the type of an object appearing in the image and data indicating the area of the object in the image are associated with the image, for each type of the object, the label corresponding to the area of the object The learning method according to (1) above, wherein a thinning rate of point cloud data thinned from three-dimensional point cloud data is changed.
(3)
The learning method according to (2), wherein the thinning rate in the object area that is the main detection target is lower than the thinning rate in the object area that is not the main detection target.
(4)
A procedure of generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. A learning program that makes a computer perform a procedure for machine learning by adjusting .
(5)
generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. machine learning by adjusting
An information processing device comprising: an information processing unit that detects an object from three-dimensional point cloud data and images input to the convolutional neural network.
(6)
A computer-executed information processing method comprising:
generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. machine learning by adjusting
An information processing method comprising detecting an object from three-dimensional point cloud data and images input to the convolutional neural network.
(7)
A procedure of generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. and a procedure for machine learning by adjusting
An information processing program for causing a computer to execute a procedure for detecting an object from three-dimensional point cloud data and images input to the convolutional neural network.
 Pc 画像
 D1 3次元点群データ
 D2 所定数の点群データ
 D3 間引かれて残った点群データ
 Dm 深度画像
Pc Image D1 Three-dimensional point cloud data D2 Predetermined number of point cloud data D3 Remaining point cloud data after thinning Dm Depth image

Claims (7)

  1.  コンピュータが実行する学習方法であって、
     LiDAR(Light Detection And Ranging)によって取得される3次元点群データから間引いた所定数の点群データと、前記3次元点群データに対応する画像とに基づいて前記画像に対応する深度画像を生成し、
     前記3次元点群データから前記所定数の点群データが間引かれて残った点群データをグランドトゥルースとして、前記深度画像と前記グランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習する
     ことを含む学習方法。
    A computer implemented learning method comprising:
    A depth image corresponding to the image is generated based on a predetermined number of point cloud data thinned out from 3D point cloud data acquired by LiDAR (Light Detection And Ranging) and an image corresponding to the 3D point cloud data. death,
    The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. A learning method that involves adjusting and machine learning.
  2.  前記画像に写る物体の種類を示すラベルと、前記画像における前記物体の領域を示すデータとが前記画像に対応付けられている場合に、前記物体の種類毎に、前記物体の領域に対応する前記3次元点群データから間引く点群データの間引き率を変更する
     請求項1に記載の学習方法。
    When a label indicating the type of an object appearing in the image and data indicating the area of the object in the image are associated with the image, for each type of the object, the label corresponding to the area of the object 2. The learning method according to claim 1, wherein the thinning rate of the point cloud data thinned out from the three-dimensional point cloud data is changed.
  3.  検出対象として主要な前記物体の領域における前記間引き率を、検出対象として主要でない前記物体の領域における前記間引き率よりも低くする
     請求項2に記載の学習方法。
    3. The learning method according to claim 2, wherein the thinning rate in areas of the object that are primary as detection targets is set lower than the thinning rate in areas of the object that are not primary as detection targets.
  4.  LiDARによって取得される3次元点群データから間引いた所定数の点群データと、前記3次元点群データに対応する画像とに基づいて前記画像に対応する深度画像を生成する手順と、
     前記3次元点群データから前記所定数の点群データが間引かれて残った点群データをグランドトゥルースとして、前記深度画像と前記グランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習する手順と
     をコンピュータに実行させる学習プログラム。
    A procedure of generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
    The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. A learning program that makes a computer perform a procedure for machine learning by adjusting .
  5.  LiDARによって取得される3次元点群データから間引いた所定数の点群データと、前記3次元点群データに対応する画像とに基づいて前記画像に対応する深度画像を生成し、
     前記3次元点群データから前記所定数の点群データが間引かれて残った点群データをグランドトゥルースとして、前記深度画像と前記グランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習し、
     前記畳み込みニューラルネットワークに入力される3次元点群データおよび画像から物体を検出する情報処理部
     を備える情報処理装置。
    generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
    The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. machine learning by adjusting
    An information processing device comprising: an information processing unit that detects an object from three-dimensional point cloud data and images input to the convolutional neural network.
  6.  コンピュータが実行する情報処理方法であって、
     LiDARによって取得される3次元点群データから間引いた所定数の点群データと、前記3次元点群データに対応する画像とに基づいて前記画像に対応する深度画像を生成し、
     前記3次元点群データから前記所定数の点群データが間引かれて残った点群データをグランドトゥルースとして、前記深度画像と前記グランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習し、
     前記畳み込みニューラルネットワークに入力される3次元点群データおよび画像から物体を検出する
     ことを含む情報処理方法。
    A computer-executed information processing method comprising:
    generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
    The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. machine learning by adjusting
    An information processing method comprising detecting an object from three-dimensional point cloud data and images input to the convolutional neural network.
  7.  LiDARによって取得される3次元点群データから間引いた所定数の点群データと、前記3次元点群データに対応する画像とに基づいて前記画像に対応する深度画像を生成する手順と、
     前記3次元点群データから前記所定数の点群データが間引かれて残った点群データをグランドトゥルースとして、前記深度画像と前記グランドトゥルースとの差分が小さくなるように、畳み込みニューラルネットワークの係数を調整して機械学習する手順と、
     前記畳み込みニューラルネットワークに入力される3次元点群データおよび画像から物体を検出する手順と
     をコンピュータに実行させる情報処理プログラム。
    A procedure of generating a depth image corresponding to the image based on a predetermined number of point cloud data thinned out from the three-dimensional point cloud data acquired by LiDAR and an image corresponding to the three-dimensional point cloud data;
    The point cloud data remaining after the predetermined number of point cloud data has been thinned out from the three-dimensional point cloud data is used as the ground truth, and the coefficients of the convolutional neural network are used to reduce the difference between the depth image and the ground truth. and a procedure for machine learning by adjusting
    An information processing program that causes a computer to execute a procedure for detecting an object from three-dimensional point cloud data and images input to the convolutional neural network.
PCT/JP2022/038868 2021-11-09 2022-10-19 Learning method, learning program, information processing device, information processing method, and information processing program WO2023085017A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-182831 2021-11-09
JP2021182831 2021-11-09

Publications (1)

Publication Number Publication Date
WO2023085017A1 true WO2023085017A1 (en) 2023-05-19

Family

ID=86335676

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/038868 WO2023085017A1 (en) 2021-11-09 2022-10-19 Learning method, learning program, information processing device, information processing method, and information processing program

Country Status (1)

Country Link
WO (1) WO2023085017A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000090272A (en) * 1998-09-16 2000-03-31 Hitachi Zosen Corp Selecting method for shoes
WO2020053611A1 (en) * 2018-09-12 2020-03-19 Toyota Motor Europe Electronic device, system and method for determining a semantic grid of an environment of a vehicle
WO2020116195A1 (en) * 2018-12-07 2020-06-11 ソニーセミコンダクタソリューションズ株式会社 Information processing device, information processing method, program, mobile body control device, and mobile body
JP2020146449A (en) * 2019-03-06 2020-09-17 国立大学法人九州大学 Magnetic resonance image high-speed reconfiguring method and magnetic resonance imaging device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000090272A (en) * 1998-09-16 2000-03-31 Hitachi Zosen Corp Selecting method for shoes
WO2020053611A1 (en) * 2018-09-12 2020-03-19 Toyota Motor Europe Electronic device, system and method for determining a semantic grid of an environment of a vehicle
WO2020116195A1 (en) * 2018-12-07 2020-06-11 ソニーセミコンダクタソリューションズ株式会社 Information processing device, information processing method, program, mobile body control device, and mobile body
JP2020146449A (en) * 2019-03-06 2020-09-17 国立大学法人九州大学 Magnetic resonance image high-speed reconfiguring method and magnetic resonance imaging device

Similar Documents

Publication Publication Date Title
US11531354B2 (en) Image processing apparatus and image processing method
JP7188394B2 (en) Image processing device and image processing method
JP7180670B2 (en) Control device, control method and program
JPWO2019082670A1 (en) Information processing equipment, information processing methods, programs, and mobiles
JPWO2019077999A1 (en) Image pickup device, image processing device, and image processing method
US20240054793A1 (en) Information processing device, information processing method, and program
KR20220020804A (en) Information processing devices and information processing methods, and programs
WO2021241189A1 (en) Information processing device, information processing method, and program
CN110281934A (en) Controller of vehicle, control method for vehicle and storage medium
US20230215151A1 (en) Information processing apparatus, information processing method, information processing system, and a program
WO2019150918A1 (en) Information processing device, information processing method, program, and moving body
US20220277556A1 (en) Information processing device, information processing method, and program
JP7198742B2 (en) AUTOMATED DRIVING VEHICLE, IMAGE DISPLAY METHOD AND PROGRAM
WO2023153083A1 (en) Information processing device, information processing method, information processing program, and moving device
JPWO2020036043A1 (en) Information processing equipment, information processing methods and programs
WO2023085017A1 (en) Learning method, learning program, information processing device, information processing method, and information processing program
US20230245423A1 (en) Information processing apparatus, information processing method, and program
US20230289980A1 (en) Learning model generation method, information processing device, and information processing system
WO2021193103A1 (en) Information processing device, information processing method, and program
WO2023085190A1 (en) Teaching data generation method, teaching data generation program, information processing device, information processing method and information processing program
WO2023021755A1 (en) Information processing device, information processing system, model, and model generation method
WO2023054090A1 (en) Recognition processing device, recognition processing method, and recognition processing system
WO2023090001A1 (en) Information processing device, information processing method, and program
WO2024024471A1 (en) Information processing device, information processing method, and information processing system
US20230410486A1 (en) Information processing apparatus, information processing method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22892517

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023559513

Country of ref document: JP