WO2023085190A1 - Teaching data generation method, teaching data generation program, information processing device, information processing method and information processing program - Google Patents

Teaching data generation method, teaching data generation program, information processing device, information processing method and information processing program Download PDF

Info

Publication number
WO2023085190A1
WO2023085190A1 PCT/JP2022/041036 JP2022041036W WO2023085190A1 WO 2023085190 A1 WO2023085190 A1 WO 2023085190A1 JP 2022041036 W JP2022041036 W JP 2022041036W WO 2023085190 A1 WO2023085190 A1 WO 2023085190A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
default box
data
information processing
label data
Prior art date
Application number
PCT/JP2022/041036
Other languages
French (fr)
Japanese (ja)
Inventor
貴裕 平野
Original Assignee
ソニーセミコンダクタソリューションズ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーセミコンダクタソリューションズ株式会社 filed Critical ソニーセミコンダクタソリューションズ株式会社
Priority to JP2023559595A priority Critical patent/JPWO2023085190A1/ja
Publication of WO2023085190A1 publication Critical patent/WO2023085190A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present disclosure relates to a teacher data generation method, a teacher data generation program, an information processing device, an information processing method, and an information processing program.
  • SSD Single Shot MultiBox Detector
  • CNN Convolutional Neural Network
  • GT rectangular ground truth
  • training data When generating training data, first prepare a plurality of input images in which objects appear, and an image with label data to which rectangular label data surrounding the object in each input image is added. Then, the feature amount of the image is extracted from the input image to generate a feature map.
  • an IoU Intersection over Union
  • GT a default box whose IoU value is greater than or equal to the threshold value is determined as GT to generate teacher data. For this reason, as GT, a default box whose position, size and aspect ratio in the feature map are similar to the label data is necessarily selected.
  • a large amount of teacher data generated in this way is input to CNN.
  • the CNN repeats the process of detecting an object from the input teacher data image.
  • the CNN then adjusts each parameter of the network so that the difference between the position, shape, and size of the detected object in the feature map and the position, shape, and size of the GT in the feature map is reduced.
  • the CNN places default boxes learned through learning in feature maps extracted from unknown input images, and derives the accuracy of the position, shape, and type of objects in the input images from the pixel data in the default boxes. can do.
  • the present disclosure proposes a teacher data generation method, a teacher data generation program, an information processing device, an information processing method, and an information processing program that can improve object detection accuracy.
  • a training data generation method is a training data generation method executed by a computer, wherein default boxes arranged on a feature map extracted from an image and label data given to objects in the image are A shape transformation is performed, and by matching the default box after the shape transformation with the label data, a default box to be the ground truth of the image is determined to generate teacher data.
  • FIG. 1 is a block diagram showing a configuration example of a vehicle control system according to the present disclosure
  • FIG. It is a figure which shows an example of a general training data generation method. It is a figure which shows the verification result of object detection. It is a figure which shows the verification result of object detection. It is a figure which shows the verification result of object detection. It is a figure which shows the verification result of object detection. It is a figure which shows the verification result of object detection. It is a figure which shows the verification result of object detection. It is a figure which shows the verification result of object detection. It is a figure which shows the verification result of object detection. It is a figure which shows the verification result of object detection. It is a figure which shows the verification result of object detection.
  • FIG. 10 is a diagram showing verification results of default box matching during learning; FIG.
  • FIG. 10 is a diagram showing verification results of default box matching during learning; It is a figure which shows an example of the shape conversion which concerns on embodiment.
  • FIG. 10 is a diagram showing a verification result of default box matching before shape conversion;
  • FIG. 10 is a diagram showing verification results of default box matching after shape conversion;
  • It is a figure which shows the verification result of object detection.
  • It is a figure which shows the verification result of object detection.
  • It is a figure which shows the verification result of object detection.
  • FIG. 1 is a block diagram showing a configuration example of a vehicle control system 11, which is an example of a mobile device control system to which the present technology is applied.
  • the vehicle control system 11 is provided in the vehicle 1 and performs processing related to driving support and automatic driving of the vehicle 1.
  • the vehicle control system 11 includes a vehicle control ECU (Electronic Control Unit) 21, a communication unit 22, a map information accumulation unit 23, a position information acquisition unit 24, an external recognition sensor 25, an in-vehicle sensor 26, a vehicle sensor 27, a storage unit 28, a driving It has a support/automatic driving control unit 29, a DMS (Driver Monitoring System) 30, an HMI (Human Machine Interface) 31, and a vehicle control unit 32.
  • vehicle control ECU Electronic Control Unit
  • communication unit 22 includes a communication unit 22, a map information accumulation unit 23, a position information acquisition unit 24, an external recognition sensor 25, an in-vehicle sensor 26, a vehicle sensor 27, a storage unit 28, a driving It has a support/automatic driving control unit 29, a DMS (Driver Monitoring System) 30, an HMI (Human Machine Interface) 31, and a vehicle control unit 32.
  • Vehicle control ECU 21, communication unit 22, map information storage unit 23, position information acquisition unit 24, external recognition sensor 25, in-vehicle sensor 26, vehicle sensor 27, storage unit 28, driving support/automatic driving control unit 29, driver monitoring system ( DMS) 30 , human machine interface (HMI) 31 , and vehicle control unit 32 are connected via a communication network 41 so as to be able to communicate with each other.
  • the communication network 41 is, for example, a CAN (Controller Area Network), LIN (Local Interconnect Network), LAN (Local Area Network), FlexRay (registered trademark), Ethernet (registered trademark), and other digital two-way communication standards. It is composed of a communication network, a bus, and the like.
  • the communication network 41 may be used properly depending on the type of data to be transmitted.
  • CAN may be applied to data related to vehicle control
  • Ethernet may be applied to large-capacity data.
  • each part of the vehicle control system 11 performs wireless communication assuming relatively short-range communication such as near field communication (NFC (Near Field Communication)) or Bluetooth (registered trademark) without going through the communication network 41. may be connected directly using NFC (Near Field Communication) or Bluetooth (registered trademark)
  • the communication unit 22 communicates with various devices inside and outside the vehicle, other vehicles, servers, base stations, etc., and transmits and receives various data.
  • the map information accumulation unit 23 accumulates one or both of the map obtained from the outside and the map created by the vehicle 1. For example, the map information accumulation unit 23 accumulates a three-dimensional high-precision map, a global map covering a wide area, and the like, which is lower in accuracy than the high-precision map.
  • the position information acquisition unit 24 receives GNSS signals from GNSS (Global Navigation Satellite System) satellites and acquires the position information of the vehicle 1 .
  • the acquired position information is supplied to the driving support/automatic driving control unit 29 .
  • the location information acquisition unit 24 is not limited to the method using GNSS signals, and may acquire location information using beacons, for example.
  • the external recognition sensor 25 includes various sensors used for recognizing situations outside the vehicle 1 and supplies sensor data from each sensor to each part of the vehicle control system 11 .
  • the type and number of sensors included in the external recognition sensor 25 are arbitrary.
  • the external recognition sensor 25 includes a camera 51, a radar 52, a LiDAR (Light Detection and Ranging, Laser Imaging Detection and Ranging) 53, and an ultrasonic sensor 54.
  • a camera 51 a radar 52
  • a LiDAR Light Detection and Ranging, Laser Imaging Detection and Ranging
  • an ultrasonic sensor 54 an ultrasonic sensor 54.
  • the in-vehicle sensor 26 includes various sensors for detecting information inside the vehicle, and supplies sensor data from each sensor to each part of the vehicle control system 11 .
  • the types and number of various sensors included in the in-vehicle sensor 26 are not particularly limited as long as they are the types and number that can be realistically installed in the vehicle 1 .
  • in-vehicle sensors 26 may comprise one or more of cameras, radar, seat sensors, steering wheel sensors, microphones, biometric sensors.
  • the vehicle sensor 27 includes various sensors for detecting the state of the vehicle 1, and supplies sensor data from each sensor to each section of the vehicle control system 11.
  • the types and number of various sensors included in the vehicle sensor 27 are not particularly limited as long as the types and number are practically installable in the vehicle 1 .
  • the vehicle sensor 27 includes a velocity sensor, an acceleration sensor, an angular velocity sensor (gyro sensor), and an inertial measurement unit (IMU (Inertial Measurement Unit)) integrating them.
  • IMU Inertial Measurement Unit
  • the storage unit 28 includes at least one of a nonvolatile storage medium and a volatile storage medium, and stores data and programs.
  • the storage unit 28 is used as, for example, EEPROM (Electrically Erasable Programmable Read Only Memory) and RAM (Random Access Memory), and storage media include magnetic storage devices such as HDD (Hard Disc Drive), semiconductor storage devices, optical storage devices, And a magneto-optical storage device can be applied.
  • the storage unit 28 stores various programs and data used by each unit of the vehicle control system 11 .
  • the driving support/automatic driving control unit 29 controls driving support and automatic driving of the vehicle 1 .
  • the driving support/automatic driving control unit 29 includes an analysis unit 61 , an action planning unit 62 and an operation control unit 63 .
  • the analysis unit 61 analyzes the vehicle 1 and its surroundings.
  • the analysis unit 61 includes a self-position estimation unit 71 , a sensor fusion unit 72 and a recognition unit 73 .
  • the self-position estimation unit 71 estimates the self-position of the vehicle 1 based on the sensor data from the external recognition sensor 25 and the high-precision map accumulated in the map information accumulation unit 23.
  • the sensor fusion unit 72 combines a plurality of different types of sensor data (for example, image data supplied from the camera 51, LiDAR 53, and sensor data supplied from the radar 52) to perform sensor fusion processing to obtain new information. I do. Methods for combining different types of sensor data include integration, fusion, federation, and the like.
  • the recognition unit 73 executes a detection process for detecting the situation outside the vehicle 1 and a recognition process for recognizing the situation outside the vehicle 1 .
  • the recognition unit 73 performs detection processing and recognition processing of the external situation of the vehicle 1 based on information from the external recognition sensor 25, information from the self-position estimation unit 71, information from the sensor fusion unit 72, and the like. .
  • the recognition unit 73 performs detection processing and recognition processing of objects around the vehicle 1 .
  • Object detection processing is, for example, processing for detecting the presence or absence, size, shape, position, movement, and the like of an object.
  • Object recognition processing is, for example, processing for recognizing an attribute such as the type of an object or identifying a specific object.
  • detection processing and recognition processing are not always clearly separated, and may overlap.
  • the recognition unit 73 detects objects around the vehicle 1 by clustering the point cloud based on sensor data from the radar 52 or the LiDAR 53 or the like for each cluster of point groups. As a result, presence/absence, size, shape, and position of objects around the vehicle 1 are detected.
  • the recognition unit 73 detects the movement of objects around the vehicle 1 by performing tracking that follows the movement of the masses of point groups classified by clustering. As a result, the speed and traveling direction (movement vector) of the object around the vehicle 1 are detected.
  • the recognition unit 73 detects or recognizes vehicles, people, bicycles, obstacles, structures, roads, traffic lights, traffic signs, road markings, etc. based on image data supplied from the camera 51 . Further, the recognition unit 73 may recognize types of objects around the vehicle 1 by performing recognition processing such as semantic segmentation.
  • the action plan section 62 creates an action plan for the vehicle 1.
  • the action planning unit 62 creates an action plan by performing route planning and route following processing.
  • global path planning is the process of planning a rough route from the start to the goal. This route planning is called trajectory planning, and in the planned route, trajectory generation (local path planning) that can proceed safely and smoothly in the vicinity of the vehicle 1 in consideration of the motion characteristics of the vehicle 1. It also includes the processing to be performed.
  • the motion control unit 63 controls the motion of the vehicle 1 in order to implement the action plan created by the action planning unit 62.
  • the DMS 30 performs driver authentication processing, driver state recognition processing, etc., based on sensor data from the in-vehicle sensor 26 and input data input to the HMI 31, which will be described later.
  • As the state of the driver to be recognized for example, physical condition, wakefulness, concentration, fatigue, gaze direction, drunkenness, driving operation, posture, etc. are assumed.
  • the HMI 31 inputs various data, instructions, etc., and presents various data to the driver or the like.
  • the vehicle control unit 32 controls each unit of the vehicle 1.
  • the vehicle control section 32 includes a steering control section 81 , a brake control section 82 , a drive control section 83 , a body system control section 84 , a light control section 85 and a horn control section 86 .
  • the steering control unit 81 detects and controls the state of the steering system of the vehicle 1 .
  • the steering system includes, for example, a steering mechanism including a steering wheel, an electric power steering, and the like.
  • the steering control unit 81 includes, for example, a steering ECU that controls the steering system, an actuator that drives the steering system, and the like.
  • the brake control unit 82 detects and controls the state of the brake system of the vehicle 1 .
  • the brake system includes, for example, a brake mechanism including a brake pedal, an ABS (Antilock Brake System), a regenerative brake mechanism, and the like.
  • the brake control unit 82 includes, for example, a brake ECU that controls the brake system, an actuator that drives the brake system, and the like.
  • the drive control unit 83 detects and controls the state of the drive system of the vehicle 1 .
  • the drive system includes, for example, an accelerator pedal, a driving force generator for generating driving force such as an internal combustion engine or a driving motor, and a driving force transmission mechanism for transmitting the driving force to the wheels.
  • the drive control unit 83 includes, for example, a drive ECU that controls the drive system, an actuator that drives the drive system, and the like.
  • the body system control unit 84 detects and controls the state of the body system of the vehicle 1 .
  • the body system includes, for example, a keyless entry system, smart key system, power window device, power seat, air conditioner, air bag, seat belt, shift lever, and the like.
  • the body system control unit 84 includes, for example, a body system ECU that controls the body system, an actuator that drives the body system, and the like.
  • the light control unit 85 detects and controls the states of various lights of the vehicle 1 .
  • Lights to be controlled include, for example, headlights, backlights, fog lights, turn signals, brake lights, projections, bumper displays, and the like.
  • the light control unit 85 includes a light ECU that controls the light, an actuator that drives the light, and the like.
  • the horn control unit 86 detects and controls the state of the car horn of the vehicle 1 .
  • the horn control unit 86 includes, for example, a horn ECU for controlling the car horn, an actuator for driving the car horn, and the like.
  • a general object detection model is an SSD (Single Shot MultiBox Detector).
  • the SSD comprises a Convolutional Neural Network (CNN) that is machine-learned to detect objects from input images.
  • CNN machine learning uses teacher data in which the type (class) of an object included in the image and a rectangular ground truth (GT) indicating the area of the object in the image are given to the image.
  • FIG. 2 is a diagram showing an example of a general training data generation method.
  • teacher data When generating teacher data, first, an input image Pa showing an object Tg and an image Pb with label data added with rectangular label data Ld surrounding the object Tg shown in the input image Pa are prepared. Then, the feature amount of the image is extracted from the input image Pa to generate the feature map Fm.
  • a plurality of default boxes Db with different aspect ratios and sizes for each layer of CNN are sequentially arranged at arbitrary positions on the feature map Fm.
  • the arranged default box Db and the label data Ld are matched to determine the default box Db to be the GT of the input image Pa, thereby generating teacher data.
  • IoU Intersection over Union
  • a default box Db whose IoU value is equal to or greater than the threshold value is determined as GT to generate teacher data. Therefore, as GT, the default box Db whose position, size and aspect ratio in the feature map Fm are similar to those of the label data Ld is necessarily selected.
  • a large amount of teacher data generated in this way is input to CNN.
  • the CNN repeats the process of detecting the object Tg from each of the multiple input images Pa serving as teacher data. Then, the CNN adjusts each parameter of the network so that the difference between the detected position, shape, and size of the object Tg in the feature map Fm and the position, shape, and size of GT in the feature map Fm becomes small. .
  • the CNN arranges the default box Db learned by learning in the feature map extracted from the unknown input image, and the accuracy of the position, shape, and type of the object appearing in the input image from the pixel data in the default box Db. can be derived.
  • the position of the image (GT) of the object Tg is shifted by 1 [pix] in the horizontal direction (both left and right) with respect to the arranged default box Db, and the object detection is verified. was obtained for all deviations.
  • the evaluation result shown in FIG. 3 was obtained.
  • the swelling of the data shown in FIG. 3 represents the amount of data in that portion (shift position).
  • the shape of the automobile is a substantially square or a horizontally long rectangle.
  • the shape of the motorcycle is a vertically long rectangle when the direction of travel of the motorcycle is the same as or opposite to that of the own vehicle.
  • the shape of the bicycle As for the shape of the bicycle, it often crosses in front of the own vehicle, but rarely runs in front of the own vehicle in the same or opposite direction as the own vehicle. Therefore, the shape of the bicycle is approximately square or oblong rectangle.
  • the shape of a person is a vertically long rectangle.
  • the score of the detection result does not decrease significantly for an object whose shape in the image is a substantially square or a horizontally long rectangle, and for an object with a vertically long rectangle It can be inferred that the score of the detection result will be greatly reduced.
  • Jaccard when the label data Ld is a vertically long rectangle and the default box Db is a vertically long rectangle having the same aspect ratio and size as the label data Ld, the positions of the label data Ld and the default box Db are matches, Jaccard will be 100%. If the label data Ld is shifted rightward by 4 [pix] from this state, Jaccard is reduced to 53.2%, which is approximately half.
  • the feature amount of the object (GT) extracted from the feature map Fm is It becomes about half of the whole, and the detection accuracy decreases.
  • the information processing device included in the recognition unit 73 uses the default box Db arranged on the feature map extracted from the image and the label data Ld given to the object in the image. Shape transformation is performed on the object. Then, the information processing device determines the default box Db to be used as the ground truth (GT) of the image by matching the default box Db after shape conversion and the label data Ld, and generates teacher data.
  • GT ground truth
  • the information processing device can improve Jaccard in default box matching during learning by performing shape conversion that brings the shapes of the default box Db and the label data Ld closer to a square. Therefore, the information processing apparatus can improve object detection accuracy by performing machine learning using the teacher data according to the present embodiment.
  • the initial label data Ld is a vertically long rectangle with an aspect ratio of 2:1 surrounding the image of a person
  • the default box Db has the same aspect ratio and shape as the label data Ld.
  • the information processing device changes the aspect ratios of the default box Db and the label data Ld.
  • the information processing device can approximate the shapes of the default box Db and the label data Ld to squares of the same size.
  • the change of the aspect ratio performed by the information processing device includes the inverse conversion of the aspect ratio.
  • the information processing device can approximate the shapes of the default box Db and the label data Ld to squares of the same size. For example, in the case of the default box Db and label data Ld shown in FIG. to generate a default box Db' and label data Ld'.
  • the information processing device performs shape conversion without changing the center position P of the default box Db and the label data Ld to generate the default box Db' and the label data Ld'.
  • the information processing device matches the label data Ld and the label data Ld' so that even if the label data Ld' is slightly misaligned, the label data Ld' can surround almost the entire area of the object. can be done.
  • the default boxes Db and Db' having the same shape as the label data Ld and Ld' are arranged in the horizontal direction N [pix] with respect to the label data Ld and Ld' during learning.
  • N [pix] the horizontal direction N [pix] with respect to the label data Ld and Ld' during learning.
  • Jaccard As shown in FIG. 14, in the default box Db and label data Ld before shape conversion, Jaccard was 42%. ' can improve Jaccard to 57%.
  • the information processing device performs the shape conversion of the default box Db and the label data Ld in this way, and then performs default box matching at the time of learning. to generate
  • the information processing apparatus generates teacher data for each layer of the CNN using the learning method described above.
  • the information processing device performs machine learning for each layer of the network using teacher data corresponding to each layer, and detects objects from images by CNN after learning. Thereby, the information processing device can improve the detection accuracy of objects of various sizes.
  • the information processing device executes the information processing program stored in the storage unit 28 to perform the above-described CNN machine learning and object detection processing.
  • the information processing apparatus can improve the object detection accuracy by performing machine learning using the teacher data according to the present embodiment.
  • the training data generation method is a training data generation method executed by a computer, wherein default boxes Db arranged on a feature map Fm extracted from an image and label data Ld assigned to an object Tg in the image , and by matching the default box Db after the shape conversion with the label data Ld, a default box Db' to be used as the ground truth GT of the image is determined to generate teacher data.
  • the information processing apparatus can improve object detection accuracy by machine-learning the CNN using the teacher data generated by the teacher data generation method according to the embodiment.
  • the shape conversion includes changing the aspect ratio of the default box Db and the label data Ld.
  • the information processing device can approximate the shapes of the default box Db and the label data Ld to squares of the same size.
  • changing the aspect ratio includes inverse conversion of the aspect ratio.
  • the information processing device can approximate the shapes of the default box Db and the label data Ld to squares of the same size.
  • shape conversion is performed without changing the center positions of the default box Db and the label data Ld.
  • the information processing device matches the label data Ld and the label data Ld' so that even if the label data Ld' is slightly misaligned, the label data Ld' can surround almost the entire area of the object. can be done.
  • teacher data is generated for each layer of the convolutional neural network.
  • the information processing device can improve the detection accuracy of objects of various sizes.
  • the training data generation program includes a procedure of performing shape conversion on the default box Db arranged on the feature map Fm extracted from the image and the label data Ld given to the object Tg in the image; By matching the default box Db after shape conversion with the label data Ld, a default box Db to be used as the ground truth of the image is determined, and a procedure for generating training data is executed by a computer.
  • the computer can improve object detection accuracy by machine-learning the CNN using the teacher data generated by the teacher data generation method according to the embodiment.
  • the information processing device includes an information processing unit.
  • the information processing unit performs shape conversion on the default box Db arranged on the feature map Fm extracted from the image and the label data Ld assigned to the object Tg in the image, and converts the shape-converted default box Db and the label
  • the default box Db to be the ground truth of the image is determined to generate teacher data
  • the teacher data is used to learn the convolutional neural network
  • the object Tg from the image input to the convolutional neural network to detect Thereby, the information processing device can improve the object detection accuracy.
  • the information processing method is an object Tg detection method executed by a computer, and includes a default box Db arranged on a feature map Fm extracted from an image and label data given to the object Tg in the image. Shape transformation is performed on Ld, and by matching the default box Db after the shape transformation with the label data Ld, a default box Db to be the ground truth of the image is determined to generate teacher data, and the teacher data is used A convolutional neural network is learned, and an object Tg is detected from an image input to the convolutional neural network. This allows the computer to improve the object detection accuracy.
  • the information processing program includes a procedure for performing shape conversion on the default box Db arranged on the feature map Fm extracted from the image and the label data Ld given to the object Tg in the image; A procedure for determining a default box Db to be the ground truth of an image by matching the converted default box Db and the label data Ld to generate teacher data, and a procedure for learning a convolutional neural network using the teacher data. , and detecting the object Tg from the image input to the convolutional neural network. This allows the computer to improve the object detection accuracy.
  • a training data generation method executed by a computer, performing shape transformation on a default box placed on a feature map extracted from an image and label data given to an object in the image;
  • a teacher data generation method comprising: determining a default box to be ground truth of the image by matching the default box after the shape transformation and label data, and generating teacher data.
  • the shape transformation is The teacher data generating method according to (1), including changing aspect ratios of the default box and the label data.
  • Changing the aspect ratio includes: The teacher data generation method according to (2), including inverse transformation of the aspect ratio.
  • the teacher data generating method including generating the teacher data for each layer of the convolutional neural network.
  • (6) a procedure of shape transformation for a default box placed on a feature map extracted from an image and label data given to an object in the image;
  • a training data generation program for causing a computer to execute a procedure of determining a default box to be the ground truth of the image by matching the default box after the shape transformation with the label data and generating training data.
  • An information processing apparatus comprising an information processing unit that detects an object from an image input to the convolutional neural network.
  • a computer-executed information processing method comprising: performing shape transformation on a default box placed on a feature map extracted from an image and label data given to an object in the image; determining a default box to be the ground truth of the image by matching the default box after the shape transformation and the label data to generate teacher data; training a convolutional neural network using the training data; An information processing method comprising detecting an object from an image input to the convolutional neural network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A teaching data generation method according to the present disclosure to be executed by a computer, said method including: the shape transformation of a default box (Db) positioned on a feature map (Fm) extracted from an image (Pa) and of label data (Ld) to be applied to an object in an image; and the generation of teaching data by determining the default box which is to be the ground truth of the image (Pa) by subjecting the shape-transformed default box (Db') and label data (Ld') to matching.

Description

教師データ生成方法、教師データ生成プログラム、情報処理装置、情報処理方法、および情報処理プログラムTeacher data generation method, teacher data generation program, information processing device, information processing method, and information processing program
 本開示は、教師データ生成方法、教師データ生成プログラム、情報処理装置、情報処理方法、および情報処理プログラムに関する。 The present disclosure relates to a teacher data generation method, a teacher data generation program, an information processing device, an information processing method, and an information processing program.
 SSD(Single Shot MultiBox Detector)という物体検出モデルがある(例えば、特許文献1参照)。SSDは、入力画像から物体を検出するように機械学習された畳み込みニューラルネットワーク(CNN:Convolutional Neural Network)を備える。CNNの機械学習には、画像に対して、画像に含まれる物体の種類(クラス)と画像における物体の領域を示す矩形状のグランドトゥルース(GT)とが付与された教師データが使用される。 There is an object detection model called SSD (Single Shot MultiBox Detector) (see Patent Document 1, for example). The SSD comprises a Convolutional Neural Network (CNN) machine-learned to detect objects from input images. CNN machine learning uses teacher data in which the type (class) of an object included in the image and a rectangular ground truth (GT) indicating the area of the object in the image are given to the image.
 教師データを生成する場合には、まず、物体が写る複数の入力画像と、各入力画像に写る物体を囲む矩形状のラベルデータを付与したラベルデータ付画像とを用意する。そして、入力画像から画像の特徴量を抽出して特徴マップを生成する。 When generating training data, first prepare a plurality of input images in which objects appear, and an image with label data to which rectangular label data surrounding the object in each input image is added. Then, the feature amount of the image is extracted from the input image to generate a feature map.
 続いて、特徴マップ上の任意の位置に、アスペクト比および大きさが異なる複数のデフォルトボックスを順次配置する。その後、配置したデフォルトボックスとラベルデータとをマッチングして、入力画像のGTにするデフォルトボックスを決定して教師データを生成する。 Next, multiple default boxes with different aspect ratios and sizes are sequentially placed at arbitrary positions on the feature map. After that, the arranged default boxes and the label data are matched to determine the default box to be the GT of the input image, thereby generating teacher data.
 デフォルトボックスとラベルデータとのマッチングでは、デフォルトボックスとラベルデータとが特徴マップに占める総面積に対するデフォルトボックスとラベルデータとの重なり部分の面積を示すIoU(Intersection over Union)値を算出する。 In the matching between the default box and the label data, an IoU (Intersection over Union) value indicating the area of the overlapping portion of the default box and the label data with respect to the total area occupied by the default box and the label data in the feature map is calculated.
 そして、IoU値が閾値以上のデフォルトボックスをGTとして決定して教師データを生成する。このため、GTとしては、特徴マップにおける位置、大きさ、およびアスペクト比がラベルデータと類似したデフォルトボックスが必然的に選択される。 Then, a default box whose IoU value is greater than or equal to the threshold value is determined as GT to generate teacher data. For this reason, as GT, a default box whose position, size and aspect ratio in the feature map are similar to the label data is necessarily selected.
 機械学習時には、こうして生成された多数の教師データがCNNに入力される。CNNは、入力される教師データの画像から物体を検出する処理を繰り返す。そして、CNNは、検出した特徴マップにおける物体の位置、形状、および大きさと、特徴マップにおけるGTの位置、形状、および大きさとの差が小さくなるように、ネットワークの各パラメータを調整する。 During machine learning, a large amount of teacher data generated in this way is input to CNN. The CNN repeats the process of detecting an object from the input teacher data image. The CNN then adjusts each parameter of the network so that the difference between the position, shape, and size of the detected object in the feature map and the position, shape, and size of the GT in the feature map is reduced.
 これにより、CNNは、未知の入力画像から抽出する特徴マップに、学習によって修得したデフォルトボックスを配置し、デフォルトボックス内の画素データから入力画像に写る物体の位置、形状、および種類の確度を導出することができる。 As a result, the CNN places default boxes learned through learning in feature maps extracted from unknown input images, and derives the accuracy of the position, shape, and type of objects in the input images from the pixel data in the default boxes. can do.
特開2020-98455号公報JP 2020-98455 A
 しかしながら、CNNは、特徴マップにおけるアスペクト比がラベルデータと類似したデフォルトボックスをGTとして決定した教師データを使用して機械学習した場合、物体の検出精度が低下することがある。 However, when CNN performs machine learning using teacher data in which GT is determined as a default box whose aspect ratio in the feature map is similar to label data, object detection accuracy may decrease.
 そこで、本開示では、物体の検出精度を向上させることができる教師データ生成方法、教師データ生成プログラム、情報処理装置、情報処理方法、および情報処理プログラムを提案する。 Therefore, the present disclosure proposes a teacher data generation method, a teacher data generation program, an information processing device, an information processing method, and an information processing program that can improve object detection accuracy.
 本開示に係る教師データ生成方法は、コンピュータが実行する教師データ生成方法であって、画像から抽出される特徴マップ上に配置されるデフォルトボックスおよび前記画像における物体に付与されるラベルデータに対して形状変換を行い、前記形状変換後の前記デフォルトボックスとラベルデータとのマッチングによって、前記画像のグランドトゥルースにするデフォルトボックスを決定して教師データを生成することを含む。 A training data generation method according to the present disclosure is a training data generation method executed by a computer, wherein default boxes arranged on a feature map extracted from an image and label data given to objects in the image are A shape transformation is performed, and by matching the default box after the shape transformation with the label data, a default box to be the ground truth of the image is determined to generate teacher data.
本開示に係る車両制御システムの構成例を示すブロック図である。1 is a block diagram showing a configuration example of a vehicle control system according to the present disclosure; FIG. 一般的な教師データ生成方法の一例を示す図である。It is a figure which shows an example of a general training data generation method. 物体検出の検証結果を示す図である。It is a figure which shows the verification result of object detection. 物体検出の検証結果を示す図である。It is a figure which shows the verification result of object detection. 物体検出の検証結果を示す図である。It is a figure which shows the verification result of object detection. 物体検出の検証結果を示す図である。It is a figure which shows the verification result of object detection. 物体検出の検証結果を示す図である。It is a figure which shows the verification result of object detection. 物体検出の検証結果を示す図である。It is a figure which shows the verification result of object detection. 物体検出の検証結果を示す図である。It is a figure which shows the verification result of object detection. 物体検出の検証結果を示す図である。It is a figure which shows the verification result of object detection. 学習時のデフォルトボックスマッチングの検証結果を示す図である。FIG. 10 is a diagram showing verification results of default box matching during learning; 学習時のデフォルトボックスマッチングの検証結果を示す図である。FIG. 10 is a diagram showing verification results of default box matching during learning; 実施形態に係る形状変換の一例を示す図である。It is a figure which shows an example of the shape conversion which concerns on embodiment. 形状変換前のデフォルトボックスマッチングの検証結果を示す図である。FIG. 10 is a diagram showing a verification result of default box matching before shape conversion; 形状変換後のデフォルトボックスマッチングの検証結果を示す図である。FIG. 10 is a diagram showing verification results of default box matching after shape conversion; 物体検出の検証結果を示す図である。It is a figure which shows the verification result of object detection. 物体検出の検証結果を示す図である。It is a figure which shows the verification result of object detection. 物体検出の検証結果を示す図である。It is a figure which shows the verification result of object detection. 物体検出の検証結果を示す図である。It is a figure which shows the verification result of object detection.
 以下に、本開示の実施形態について図面に基づいて詳細に説明する。なお、以下の各実施形態において、同一の部位には同一の符号を付することにより重複する説明を省略する。 Below, embodiments of the present disclosure will be described in detail based on the drawings. In addition, in each of the following embodiments, the same parts are denoted by the same reference numerals, thereby omitting redundant explanations.
[1.車両制御システムの構成例]
 図1は、本技術が適用される移動装置制御システムの一例である車両制御システム11の構成例を示すブロック図である。
[1. Configuration example of vehicle control system]
FIG. 1 is a block diagram showing a configuration example of a vehicle control system 11, which is an example of a mobile device control system to which the present technology is applied.
 車両制御システム11は、車両1に設けられ、車両1の走行支援及び自動運転に関わる処理を行う。 The vehicle control system 11 is provided in the vehicle 1 and performs processing related to driving support and automatic driving of the vehicle 1.
 車両制御システム11は、車両制御ECU(Electronic Control Unit)21、通信部22、地図情報蓄積部23、位置情報取得部24、外部認識センサ25、車内センサ26、車両センサ27、記憶部28、走行支援・自動運転制御部29、DMS(Driver Monitoring System)30、HMI(Human Machine Interface)31、及び、車両制御部32を備える。 The vehicle control system 11 includes a vehicle control ECU (Electronic Control Unit) 21, a communication unit 22, a map information accumulation unit 23, a position information acquisition unit 24, an external recognition sensor 25, an in-vehicle sensor 26, a vehicle sensor 27, a storage unit 28, a driving It has a support/automatic driving control unit 29, a DMS (Driver Monitoring System) 30, an HMI (Human Machine Interface) 31, and a vehicle control unit 32.
 車両制御ECU21、通信部22、地図情報蓄積部23、位置情報取得部24、外部認識センサ25、車内センサ26、車両センサ27、記憶部28、走行支援・自動運転制御部29、ドライバモニタリングシステム(DMS)30、ヒューマンマシーンインタフェース(HMI)31、及び、車両制御部32は、通信ネットワーク41を介して相互に通信可能に接続されている。通信ネットワーク41は、例えば、CAN(Controller Area Network)、LIN(Local Interconnect Network)、LAN(Local Area Network)、FlexRay(登録商標)、イーサネット(登録商標)といったディジタル双方向通信の規格に準拠した車載通信ネットワークやバス等により構成される。通信ネットワーク41は、伝送されるデータの種類によって使い分けられてもよい。例えば、車両制御に関するデータに対してCANが適用され、大容量データに対してイーサネットが適用されるようにしてもよい。なお、車両制御システム11の各部は、通信ネットワーク41を介さずに、例えば近距離無線通信(NFC(Near Field Communication))やBluetooth(登録商標)といった比較的近距離での通信を想定した無線通信を用いて直接的に接続される場合もある。 Vehicle control ECU 21, communication unit 22, map information storage unit 23, position information acquisition unit 24, external recognition sensor 25, in-vehicle sensor 26, vehicle sensor 27, storage unit 28, driving support/automatic driving control unit 29, driver monitoring system ( DMS) 30 , human machine interface (HMI) 31 , and vehicle control unit 32 are connected via a communication network 41 so as to be able to communicate with each other. The communication network 41 is, for example, a CAN (Controller Area Network), LIN (Local Interconnect Network), LAN (Local Area Network), FlexRay (registered trademark), Ethernet (registered trademark), and other digital two-way communication standards. It is composed of a communication network, a bus, and the like. The communication network 41 may be used properly depending on the type of data to be transmitted. For example, CAN may be applied to data related to vehicle control, and Ethernet may be applied to large-capacity data. In addition, each part of the vehicle control system 11 performs wireless communication assuming relatively short-range communication such as near field communication (NFC (Near Field Communication)) or Bluetooth (registered trademark) without going through the communication network 41. may be connected directly using
 通信部22は、車内及び車外の様々な機器、他の車両、サーバ、基地局等と通信を行い、各種のデータの送受信を行う。 The communication unit 22 communicates with various devices inside and outside the vehicle, other vehicles, servers, base stations, etc., and transmits and receives various data.
 地図情報蓄積部23は、外部から取得した地図及び車両1で作成した地図の一方又は両方を蓄積する。例えば、地図情報蓄積部23は、3次元の高精度地図、高精度地図より精度が低く、広いエリアをカバーするグローバルマップ等を蓄積する。 The map information accumulation unit 23 accumulates one or both of the map obtained from the outside and the map created by the vehicle 1. For example, the map information accumulation unit 23 accumulates a three-dimensional high-precision map, a global map covering a wide area, and the like, which is lower in accuracy than the high-precision map.
 位置情報取得部24は、GNSS(Global Navigation Satellite System)衛星からGNSS信号を受信し、車両1の位置情報を取得する。取得した位置情報は、走行支援・自動運転制御部29に供給される。なお、位置情報取得部24は、GNSS信号を用いた方式に限定されず、例えば、ビーコンを用いて位置情報を取得してもよい。 The position information acquisition unit 24 receives GNSS signals from GNSS (Global Navigation Satellite System) satellites and acquires the position information of the vehicle 1 . The acquired position information is supplied to the driving support/automatic driving control unit 29 . Note that the location information acquisition unit 24 is not limited to the method using GNSS signals, and may acquire location information using beacons, for example.
 外部認識センサ25は、車両1の外部の状況の認識に用いられる各種のセンサを備え、各センサからのセンサデータを車両制御システム11の各部に供給する。外部認識センサ25が備えるセンサの種類や数は任意である。 The external recognition sensor 25 includes various sensors used for recognizing situations outside the vehicle 1 and supplies sensor data from each sensor to each part of the vehicle control system 11 . The type and number of sensors included in the external recognition sensor 25 are arbitrary.
 例えば、外部認識センサ25は、カメラ51、レーダ52、LiDAR(Light Detection and Ranging、Laser Imaging Detection and Ranging)53、及び、超音波センサ54を備える。 For example, the external recognition sensor 25 includes a camera 51, a radar 52, a LiDAR (Light Detection and Ranging, Laser Imaging Detection and Ranging) 53, and an ultrasonic sensor 54.
 車内センサ26は、車内の情報を検出するための各種のセンサを備え、各センサからのセンサデータを車両制御システム11の各部に供給する。車内センサ26が備える各種センサの種類や数は、現実的に車両1に設置可能な種類や数であれば特に限定されない。例えば、車内センサ26は、カメラ、レーダ、着座センサ、ステアリングホイールセンサ、マイクロフォン、生体センサのうち1種類以上のセンサを備えることができる。 The in-vehicle sensor 26 includes various sensors for detecting information inside the vehicle, and supplies sensor data from each sensor to each part of the vehicle control system 11 . The types and number of various sensors included in the in-vehicle sensor 26 are not particularly limited as long as they are the types and number that can be realistically installed in the vehicle 1 . For example, in-vehicle sensors 26 may comprise one or more of cameras, radar, seat sensors, steering wheel sensors, microphones, biometric sensors.
 車両センサ27は、車両1の状態を検出するための各種のセンサを備え、各センサからのセンサデータを車両制御システム11の各部に供給する。車両センサ27が備える各種センサの種類や数は、現実的に車両1に設置可能な種類や数であれば特に限定されない。例えば、車両センサ27は、速度センサ、加速度センサ、角速度センサ(ジャイロセンサ)、及び、それらを統合した慣性計測装置(IMU(Inertial Measurement Unit))を備える。 The vehicle sensor 27 includes various sensors for detecting the state of the vehicle 1, and supplies sensor data from each sensor to each section of the vehicle control system 11. The types and number of various sensors included in the vehicle sensor 27 are not particularly limited as long as the types and number are practically installable in the vehicle 1 . For example, the vehicle sensor 27 includes a velocity sensor, an acceleration sensor, an angular velocity sensor (gyro sensor), and an inertial measurement unit (IMU (Inertial Measurement Unit)) integrating them.
 記憶部28は、不揮発性の記憶媒体及び揮発性の記憶媒体のうち少なくとも一方を含み、データやプログラムを記憶する。記憶部28は、例えばEEPROM(Electrically Erasable Programmable Read Only Memory)及びRAM(Random Access Memory)として用いられ、記憶媒体としては、HDD(Hard Disc Drive)といった磁気記憶デバイス、半導体記憶デバイス、光記憶デバイス、及び、光磁気記憶デバイスを適用することができる。記憶部28は、車両制御システム11の各部が用いる各種プログラムやデータを記憶する。 The storage unit 28 includes at least one of a nonvolatile storage medium and a volatile storage medium, and stores data and programs. The storage unit 28 is used as, for example, EEPROM (Electrically Erasable Programmable Read Only Memory) and RAM (Random Access Memory), and storage media include magnetic storage devices such as HDD (Hard Disc Drive), semiconductor storage devices, optical storage devices, And a magneto-optical storage device can be applied. The storage unit 28 stores various programs and data used by each unit of the vehicle control system 11 .
 走行支援・自動運転制御部29は、車両1の走行支援及び自動運転の制御を行う。例えば、走行支援・自動運転制御部29は、分析部61、行動計画部62、及び、動作制御部63を備える。 The driving support/automatic driving control unit 29 controls driving support and automatic driving of the vehicle 1 . For example, the driving support/automatic driving control unit 29 includes an analysis unit 61 , an action planning unit 62 and an operation control unit 63 .
 分析部61は、車両1及び周囲の状況の分析処理を行う。分析部61は、自己位置推定部71、センサフュージョン部72、及び、認識部73を備える。 The analysis unit 61 analyzes the vehicle 1 and its surroundings. The analysis unit 61 includes a self-position estimation unit 71 , a sensor fusion unit 72 and a recognition unit 73 .
 自己位置推定部71は、外部認識センサ25からのセンサデータ、及び、地図情報蓄積部23に蓄積されている高精度地図に基づいて、車両1の自己位置を推定する。 The self-position estimation unit 71 estimates the self-position of the vehicle 1 based on the sensor data from the external recognition sensor 25 and the high-precision map accumulated in the map information accumulation unit 23.
 センサフュージョン部72は、複数の異なる種類のセンサデータ(例えば、カメラ51から供給される画像データ、LiDAR53、及び、レーダ52から供給されるセンサデータ)を組み合わせて、新たな情報を得るセンサフュージョン処理を行う。異なる種類のセンサデータを組合せる方法としては、統合、融合、連合等がある。 The sensor fusion unit 72 combines a plurality of different types of sensor data (for example, image data supplied from the camera 51, LiDAR 53, and sensor data supplied from the radar 52) to perform sensor fusion processing to obtain new information. I do. Methods for combining different types of sensor data include integration, fusion, federation, and the like.
 認識部73は、車両1の外部の状況の検出を行う検出処理、及び、車両1の外部の状況の認識を行う認識処理を実行する。 The recognition unit 73 executes a detection process for detecting the situation outside the vehicle 1 and a recognition process for recognizing the situation outside the vehicle 1 .
 例えば、認識部73は、外部認識センサ25からの情報、自己位置推定部71からの情報、センサフュージョン部72からの情報等に基づいて、車両1の外部の状況の検出処理及び認識処理を行う。 For example, the recognition unit 73 performs detection processing and recognition processing of the external situation of the vehicle 1 based on information from the external recognition sensor 25, information from the self-position estimation unit 71, information from the sensor fusion unit 72, and the like. .
 具体的には、例えば、認識部73は、車両1の周囲の物体の検出処理及び認識処理等を行う。物体の検出処理とは、例えば、物体の有無、大きさ、形、位置、動き等を検出する処理である。物体の認識処理とは、例えば、物体の種類等の属性を認識したり、特定の物体を識別したりする処理である。ただし、検出処理と認識処理とは、必ずしも明確に分かれるものではなく、重複する場合がある。 Specifically, for example, the recognition unit 73 performs detection processing and recognition processing of objects around the vehicle 1 . Object detection processing is, for example, processing for detecting the presence or absence, size, shape, position, movement, and the like of an object. Object recognition processing is, for example, processing for recognizing an attribute such as the type of an object or identifying a specific object. However, detection processing and recognition processing are not always clearly separated, and may overlap.
 例えば、認識部73は、レーダ52又はLiDAR53等によるセンサデータに基づくポイントクラウドを点群の塊毎に分類するクラスタリングを行うことにより、車両1の周囲の物体を検出する。これにより、車両1の周囲の物体の有無、大きさ、形状、位置が検出される。 For example, the recognition unit 73 detects objects around the vehicle 1 by clustering the point cloud based on sensor data from the radar 52 or the LiDAR 53 or the like for each cluster of point groups. As a result, presence/absence, size, shape, and position of objects around the vehicle 1 are detected.
 例えば、認識部73は、クラスタリングにより分類された点群の塊の動きを追従するトラッキングを行うことにより、車両1の周囲の物体の動きを検出する。これにより、車両1の周囲の物体の速度及び進行方向(移動ベクトル)が検出される。 For example, the recognition unit 73 detects the movement of objects around the vehicle 1 by performing tracking that follows the movement of the masses of point groups classified by clustering. As a result, the speed and traveling direction (movement vector) of the object around the vehicle 1 are detected.
 例えば、認識部73は、カメラ51から供給される画像データに基づいて、車両、人、自転車、障害物、構造物、道路、信号機、交通標識、道路標示等を検出又は認識する。また、認識部73は、セマンティックセグメンテーション等の認識処理を行うことにより、車両1の周囲の物体の種類を認識してもよい。 For example, the recognition unit 73 detects or recognizes vehicles, people, bicycles, obstacles, structures, roads, traffic lights, traffic signs, road markings, etc. based on image data supplied from the camera 51 . Further, the recognition unit 73 may recognize types of objects around the vehicle 1 by performing recognition processing such as semantic segmentation.
 行動計画部62は、車両1の行動計画を作成する。例えば、行動計画部62は、経路計画、経路追従の処理を行うことにより、行動計画を作成する。 The action plan section 62 creates an action plan for the vehicle 1. For example, the action planning unit 62 creates an action plan by performing route planning and route following processing.
 なお、経路計画(Global path planning)とは、スタートからゴールまでの大まかな経路を計画する処理である。この経路計画には、軌道計画と言われ、計画した経路において、車両1の運動特性を考慮して、車両1の近傍で安全かつ滑らかに進行することが可能な軌道生成(Local path planning)を行う処理も含まれる。 It should be noted that global path planning is the process of planning a rough route from the start to the goal. This route planning is called trajectory planning, and in the planned route, trajectory generation (local path planning) that can proceed safely and smoothly in the vicinity of the vehicle 1 in consideration of the motion characteristics of the vehicle 1. It also includes the processing to be performed.
 動作制御部63は、行動計画部62により作成された行動計画を実現するために、車両1の動作を制御する。 The motion control unit 63 controls the motion of the vehicle 1 in order to implement the action plan created by the action planning unit 62.
 DMS30は、車内センサ26からのセンサデータ、及び、後述するHMI31に入力される入力データ等に基づいて、運転者の認証処理、及び、運転者の状態の認識処理等を行う。認識対象となる運転者の状態としては、例えば、体調、覚醒度、集中度、疲労度、視線方向、酩酊度、運転操作、姿勢等が想定される。HMI31は、各種のデータや指示等の入力と、各種のデータの運転者等への提示を行う。 The DMS 30 performs driver authentication processing, driver state recognition processing, etc., based on sensor data from the in-vehicle sensor 26 and input data input to the HMI 31, which will be described later. As the state of the driver to be recognized, for example, physical condition, wakefulness, concentration, fatigue, gaze direction, drunkenness, driving operation, posture, etc. are assumed. The HMI 31 inputs various data, instructions, etc., and presents various data to the driver or the like.
 車両制御部32は、車両1の各部の制御を行う。車両制御部32は、ステアリング制御部81、ブレーキ制御部82、駆動制御部83、ボディ系制御部84、ライト制御部85、及び、ホーン制御部86を備える。 The vehicle control unit 32 controls each unit of the vehicle 1. The vehicle control section 32 includes a steering control section 81 , a brake control section 82 , a drive control section 83 , a body system control section 84 , a light control section 85 and a horn control section 86 .
 ステアリング制御部81は、車両1のステアリングシステムの状態の検出及び制御等を行う。ステアリングシステムは、例えば、ステアリングホイール等を備えるステアリング機構、電動パワーステアリング等を備える。ステアリング制御部81は、例えば、ステアリングシステムの制御を行うステアリングECU、ステアリングシステムの駆動を行うアクチュエータ等を備える。 The steering control unit 81 detects and controls the state of the steering system of the vehicle 1 . The steering system includes, for example, a steering mechanism including a steering wheel, an electric power steering, and the like. The steering control unit 81 includes, for example, a steering ECU that controls the steering system, an actuator that drives the steering system, and the like.
 ブレーキ制御部82は、車両1のブレーキシステムの状態の検出及び制御等を行う。ブレーキシステムは、例えば、ブレーキペダル等を含むブレーキ機構、ABS(Antilock Brake System)、回生ブレーキ機構等を備える。ブレーキ制御部82は、例えば、ブレーキシステムの制御を行うブレーキECU、ブレーキシステムの駆動を行うアクチュエータ等を備える。 The brake control unit 82 detects and controls the state of the brake system of the vehicle 1 . The brake system includes, for example, a brake mechanism including a brake pedal, an ABS (Antilock Brake System), a regenerative brake mechanism, and the like. The brake control unit 82 includes, for example, a brake ECU that controls the brake system, an actuator that drives the brake system, and the like.
 駆動制御部83は、車両1の駆動システムの状態の検出及び制御等を行う。駆動システムは、例えば、アクセルペダル、内燃機関又は駆動用モータ等の駆動力を発生させるための駆動力発生装置、駆動力を車輪に伝達するための駆動力伝達機構等を備える。駆動制御部83は、例えば、駆動システムの制御を行う駆動ECU、駆動システムの駆動を行うアクチュエータ等を備える。 The drive control unit 83 detects and controls the state of the drive system of the vehicle 1 . The drive system includes, for example, an accelerator pedal, a driving force generator for generating driving force such as an internal combustion engine or a driving motor, and a driving force transmission mechanism for transmitting the driving force to the wheels. The drive control unit 83 includes, for example, a drive ECU that controls the drive system, an actuator that drives the drive system, and the like.
 ボディ系制御部84は、車両1のボディ系システムの状態の検出及び制御等を行う。ボディ系システムは、例えば、キーレスエントリシステム、スマートキーシステム、パワーウインドウ装置、パワーシート、空調装置、エアバッグ、シートベルト、シフトレバー等を備える。ボディ系制御部84は、例えば、ボディ系システムの制御を行うボディ系ECU、ボディ系システムの駆動を行うアクチュエータ等を備える。 The body system control unit 84 detects and controls the state of the body system of the vehicle 1 . The body system includes, for example, a keyless entry system, smart key system, power window device, power seat, air conditioner, air bag, seat belt, shift lever, and the like. The body system control unit 84 includes, for example, a body system ECU that controls the body system, an actuator that drives the body system, and the like.
 ライト制御部85は、車両1の各種のライトの状態の検出及び制御等を行う。制御対象となるライトとしては、例えば、ヘッドライト、バックライト、フォグライト、ターンシグナル、ブレーキライト、プロジェクション、バンパーの表示等が想定される。ライト制御部85は、ライトの制御を行うライトECU、ライトの駆動を行うアクチュエータ等を備える。 The light control unit 85 detects and controls the states of various lights of the vehicle 1 . Lights to be controlled include, for example, headlights, backlights, fog lights, turn signals, brake lights, projections, bumper displays, and the like. The light control unit 85 includes a light ECU that controls the light, an actuator that drives the light, and the like.
 ホーン制御部86は、車両1のカーホーンの状態の検出及び制御等を行う。ホーン制御部86は、例えば、カーホーンの制御を行うホーンECU、カーホーンの駆動を行うアクチュエータ等を備える。 The horn control unit 86 detects and controls the state of the car horn of the vehicle 1 . The horn control unit 86 includes, for example, a horn ECU for controlling the car horn, an actuator for driving the car horn, and the like.
[2.認識部が使用する物体検出モデルの一例]
 一般的な物体検出モデルとして、SSD(Single Shot MultiBox Detector)がある。SSDは、入力画像から物体を検出するように機械学習された畳み込みニューラルネットワーク(CNN:Convolutional Neural Network)を備える。CNNの機械学習には、画像に対して、画像に含まれる物体の種類(クラス)と画像における物体の領域を示す矩形状のグランドトゥルース(GT)とが付与された教師データが使用される。
[2. An example of an object detection model used by the recognition unit]
A general object detection model is an SSD (Single Shot MultiBox Detector). The SSD comprises a Convolutional Neural Network (CNN) that is machine-learned to detect objects from input images. CNN machine learning uses teacher data in which the type (class) of an object included in the image and a rectangular ground truth (GT) indicating the area of the object in the image are given to the image.
 図2は、一般的な教師データ生成方法の一例を示す図である。教師データを生成する場合には、まず、物体Tgが写る入力画像Paと、入力画像Paに写る物体Tgを囲む矩形状のラベルデータLdを付与したラベルデータ付画像Pbとを用意する。そして、入力画像Paから画像の特徴量を抽出して特徴マップFmを生成する。 FIG. 2 is a diagram showing an example of a general training data generation method. When generating teacher data, first, an input image Pa showing an object Tg and an image Pb with label data added with rectangular label data Ld surrounding the object Tg shown in the input image Pa are prepared. Then, the feature amount of the image is extracted from the input image Pa to generate the feature map Fm.
 続いて、特徴マップFm上の任意の位置に、CNNの階層毎にアスペクト比および大きさが異なる複数のデフォルトボックスDbを順次配置する。その後、配置したデフォルトボックスDbとラベルデータLdとをマッチングして、入力画像PaのGTにするデフォルトボックスDbを決定して教師データを生成する。 Subsequently, a plurality of default boxes Db with different aspect ratios and sizes for each layer of CNN are sequentially arranged at arbitrary positions on the feature map Fm. After that, the arranged default box Db and the label data Ld are matched to determine the default box Db to be the GT of the input image Pa, thereby generating teacher data.
 デフォルトボックスDbとラベルデータLdとのマッチングでは、デフォルトボックスDbとラベルデータLdとが特徴マップFmに占める総面積に対するデフォルトボックスDbとラベルデータLdとの重なり部分(図2に示す黒塗り領域)の面積を示すIoU(Intersection over Union)値を算出する。 In the matching between the default box Db and the label data Ld, the overlapping portion (black area shown in FIG. 2) of the default box Db and the label data Ld with respect to the total area occupied by the default box Db and the label data Ld in the feature map Fm. Calculate the IoU (Intersection over Union) value that indicates the area.
 そして、IoU値が閾値以上のデフォルトボックスDbをGTとして決定して教師データを生成する。このため、GTとしては、特徴マップFmにおける位置、大きさ、およびアスペクト比がラベルデータLdと類似したデフォルトボックスDbが必然的に選択される。 Then, a default box Db whose IoU value is equal to or greater than the threshold value is determined as GT to generate teacher data. Therefore, as GT, the default box Db whose position, size and aspect ratio in the feature map Fm are similar to those of the label data Ld is necessarily selected.
 機械学習時には、こうして生成された多数の教師データがCNNに入力される。CNNは、教師データとなる複数の各入力画像Paから物体Tgを検出する処理を繰り返す。そして、CNNは、検出した特徴マップFmにおける物体Tgの位置、形状、および大きさと、特徴マップFmにおけるGTの位置、形状、および大きさとの差が小さくなるように、ネットワークの各パラメータを調整する。 During machine learning, a large amount of teacher data generated in this way is input to CNN. The CNN repeats the process of detecting the object Tg from each of the multiple input images Pa serving as teacher data. Then, the CNN adjusts each parameter of the network so that the difference between the detected position, shape, and size of the object Tg in the feature map Fm and the position, shape, and size of GT in the feature map Fm becomes small. .
 これにより、CNNは、未知の入力画像から抽出する特徴マップに、学習によって修得したデフォルトボックスDbを配置し、デフォルトボックスDb内の画素データから入力画像に写る物体の位置、形状、および種類の確度を導出することができる。 As a result, the CNN arranges the default box Db learned by learning in the feature map extracted from the unknown input image, and the accuracy of the position, shape, and type of the object appearing in the input image from the pixel data in the default box Db. can be derived.
[3.一般的なSSDによる物体検出の検証]
 しかしながら、SSDは、走行する自車両の揺れや検出対象物の進路変更が発生する場合に、物体検出精度が低下することがある。これは、入力画像から抽出した特徴マップ上に配置するデフォルトボックスDbの中心位置と、入力画像における物体Tgの中心位置との距離が拡大することが原因と推測できる。
[3. Verification of object detection by general SSD]
However, in the case of the SSD, the accuracy of object detection may deteriorate when the own vehicle shakes or the course of the object to be detected changes. It can be assumed that this is because the distance between the center position of the default box Db arranged on the feature map extracted from the input image and the center position of the object Tg in the input image increases.
 そこで、配置したデフォルトボックスDbに対して、横方向(左右両方向)へ1[pix]ずつ物体Tgの画像(GT)の位置をずらしながら物体検出する検証を行い、それぞれのGTにマッチした検出結果を全てのずれに対して取得した。 Therefore, the position of the image (GT) of the object Tg is shifted by 1 [pix] in the horizontal direction (both left and right) with respect to the arranged default box Db, and the object detection is verified. was obtained for all deviations.
 そして、検出結果のスコアの最大値が1.0となるように上界を正規化した結果、図3に示す評価結果が得られた。図3に示すデータの膨らみは、その部分(ずれ位置)におけるデータの多さを表している。図3に示すように、評価結果では、デフォルトボックスDbに対するGTの横方向へのずれ量が大きいほど、検出結果のスコアが低くなっている。 Then, as a result of normalizing the upper bound so that the maximum value of the score of the detection result is 1.0, the evaluation result shown in FIG. 3 was obtained. The swelling of the data shown in FIG. 3 represents the amount of data in that portion (shift position). As shown in FIG. 3, in the evaluation results, the larger the amount of lateral displacement of GT with respect to the default box Db, the lower the score of the detection result.
 さらに、物体のクラス(種類)別に同様の検証を行った結果、図4~図7に示す評価結果が得られた。図4に示すように、自動車の画像では、デフォルトボックスDbに対するGTの横方向へのずれ量が増大しても、検出結果のスコアは大きくは低下しない。 Furthermore, as a result of conducting the same verification for each object class (type), the evaluation results shown in Figures 4 to 7 were obtained. As shown in FIG. 4, in the automobile image, the score of the detection result does not decrease significantly even if the amount of lateral displacement of GT with respect to the default box Db increases.
 これに対して、図5に示すように、オートバイの画像では、デフォルトボックスDbに対するGTの横方向へのずれ量が増大するほど、検出結果のスコアは大きく低下する。 On the other hand, as shown in FIG. 5, in the motorcycle image, the greater the amount of lateral displacement of the GT with respect to the default box Db, the greater the score of the detection result.
 一方、図6に示すように、自転車の画像では、デフォルトボックスDbに対するGTの横方向へのずれ量が増大しても、検出結果のスコアに変動は少ない。これに対して、図7に示すように、人の画像では、デフォルトボックスDbに対するGTの横方向へのずれ量が増大するほど、検出結果のスコアは大きく低下する。 On the other hand, as shown in FIG. 6, in the image of the bicycle, even if the amount of lateral displacement of the GT with respect to the default box Db increases, the score of the detection result does not fluctuate much. On the other hand, as shown in FIG. 7, in the image of a person, the score of the detection result greatly decreases as the amount of lateral displacement of GT with respect to the default box Db increases.
 ここで、画像中の自動車車、オートバイ、自転車、および人の画像における形状に注目すると、自動車の形状は、略正方形または横長矩形である。オートバイの形状は、オートバイの進行方向が自車両と同一方向または逆方向の場合、縦長矩形である。 Here, focusing on the shapes in the images of automobiles, motorcycles, bicycles, and people in the image, the shape of the automobile is a substantially square or a horizontally long rectangle. The shape of the motorcycle is a vertically long rectangle when the direction of travel of the motorcycle is the same as or opposite to that of the own vehicle.
 自転車の形状は、自転車は、自車両の前方を横切る場合は多いが、自車両の前方で自車両と同一方向または逆方向に走行する場合は少ない。このため、自転車の形状は、略正方形または横長矩形である。人の形状は、縦長矩形である。 As for the shape of the bicycle, it often crosses in front of the own vehicle, but rarely runs in front of the own vehicle in the same or opposite direction as the own vehicle. Therefore, the shape of the bicycle is approximately square or oblong rectangle. The shape of a person is a vertically long rectangle.
 このことから、デフォルトボックスDbに対するGTの横方向へのずれ量が増大した場合、画像中の形状が略正方形または横長矩形の物体では検出結果のスコアは大きくは低下せず、縦長矩形の物体では検出結果のスコアは大きくは低下することが推測できる。 From this, when the displacement amount of GT in the horizontal direction with respect to the default box Db increases, the score of the detection result does not decrease significantly for an object whose shape in the image is a substantially square or a horizontally long rectangle, and for an object with a vertically long rectangle It can be inferred that the score of the detection result will be greatly reduced.
 このため、画像中の物体の形状毎に同様の検証を行った結果、図8~図10に示す評価結果が得られた。この検証結果によって、図8~図10に示すように、デフォルトボックスDbに対するGTの横方向へのずれ量が増大した場合、画像中の形状が略正方形または横長矩形の物体では検出結果のスコアは大きくは低下せず、縦長矩形の物体では検出結果のスコアは大きくは低下することが実証された。 Therefore, as a result of performing similar verification for each shape of the object in the image, the evaluation results shown in FIGS. 8 to 10 were obtained. As a result of this verification, as shown in FIGS. 8 to 10, when the displacement of GT in the horizontal direction with respect to the default box Db increases, the score of the detection result is It was demonstrated that the score of the detection result does not decrease significantly, and the score of the detection result decreases greatly for vertically long rectangular objects.
[4.学習時におけるデフォルトボックスマッチングの検証]
 上記した現象の発生は、学習時におけるデフォルトボックスDbのマッチングに問題があるという仮説に基づいて、学習時にラベルデータLdと同一形状のデフォルトボックスDbをラベルデータLdに対して横方向に4[pix]移動させてJaccard(IoU値)を算出すると、図11~図12に示す結果が得られた。
[4. Verification of default box matching during learning]
The occurrence of the above phenomenon is based on the hypothesis that there is a problem in the matching of the default box Db during learning. ] and calculated Jaccard (IoU value), the results shown in FIGS. 11 and 12 were obtained.
 図11に示すように、ラベルデータLdが正方形であり、デフォルトボックスDbがラベルデータLdと同じ大きさの正方形である場合に、ラベルデータLdとデフォルトボックスDbとの位置が一致していれば、Jaccardは、100%になる。この状態からラベルデータLdを右方向へ4[pix]ずらすと、Jaccardは、78.3%になるが、大きくは低下しない。 As shown in FIG. 11, when the label data Ld is a square and the default box Db is a square of the same size as the label data Ld, if the positions of the label data Ld and the default box Db match, Jaccard goes to 100%. If the label data Ld is shifted to the right by 4 [pix] from this state, Jaccard becomes 78.3%, but does not drop significantly.
 一方、図12に示すように、ラベルデータLdが縦長矩形であり、デフォルトボックスDbがラベルデータLdと同じアスペクト比で大きさの縦長矩形である場合に、ラベルデータLdとデフォルトボックスDbとの位置が一致していれば、Jaccardは、100%になる。この状態からラベルデータLdを右方向へ4[pix]ずらすと、Jaccardは、53.2%と、約半分まで低下する。 On the other hand, as shown in FIG. 12, when the label data Ld is a vertically long rectangle and the default box Db is a vertically long rectangle having the same aspect ratio and size as the label data Ld, the positions of the label data Ld and the default box Db are matches, Jaccard will be 100%. If the label data Ld is shifted rightward by 4 [pix] from this state, Jaccard is reduced to 53.2%, which is approximately half.
 このように、ラベルデータLdおよびデフォルトボックスDbが同一形状で同一サイズの縦長矩形の場合、実際の物体検出時にデフォルトボックスDbを適用すると、特徴マップFmから抽出される物体(GT)の特徴量が全体の約半分になり検出精度が低下する。 Thus, when the label data Ld and the default box Db are vertical rectangles of the same shape and size, and the default box Db is applied during actual object detection, the feature amount of the object (GT) extracted from the feature map Fm is It becomes about half of the whole, and the detection accuracy decreases.
[5.デフォルトボックスおよびラベルデータの形状変換]
 そこで、本開示に係る教師データ生成方法では、認識部73に含まれる情報処理装置が、画像から抽出される特徴マップ上に配置されるデフォルトボックスDbおよび画像における物体に付与されるラベルデータLdに対して形状変換を行う。そして、情報処理装置は、形状変換後のデフォルトボックスDbとラベルデータLdとのマッチングによって、画像のグランドトゥルース(GT)にするデフォルトボックスDbを決定して教師データを生成する。
[5. Shape conversion of default box and label data]
Therefore, in the training data generation method according to the present disclosure, the information processing device included in the recognition unit 73 uses the default box Db arranged on the feature map extracted from the image and the label data Ld given to the object in the image. Shape transformation is performed on the object. Then, the information processing device determines the default box Db to be used as the ground truth (GT) of the image by matching the default box Db after shape conversion and the label data Ld, and generates teacher data.
 これにより、情報処理装置は、デフォルトボックスDbおよびラベルデータLdの形状を正方形に近付ける形状変換を行うことで、学習時のデフォルトボックスマッチングにおいて、Jaccardを向上させることができる。したがって、情報処理装置は、本実施形態に係る教師データを使用して機械学習することにより、物体検出精度を向上させることができる。 As a result, the information processing device can improve Jaccard in default box matching during learning by performing shape conversion that brings the shapes of the default box Db and the label data Ld closer to a square. Therefore, the information processing apparatus can improve object detection accuracy by performing machine learning using the teacher data according to the present embodiment.
 例えば、図13に示すように、当初のラベルデータLdが人の画像を囲む縦横比が2対1の縦長矩形であり、デフォルトボックスDbがラベルデータLdと同一のアスペクト比であり、同一の形状であるとする。この場合、情報処理装置は、デフォルトボックスDbおよびラベルデータLdのアスペクト比を変更する。これにより、情報処理装置は、デフォルトボックスDbおよびラベルデータLdの形状を同じ大きさの正方形に近付けることができる。 For example, as shown in FIG. 13, the initial label data Ld is a vertically long rectangle with an aspect ratio of 2:1 surrounding the image of a person, and the default box Db has the same aspect ratio and shape as the label data Ld. Suppose that In this case, the information processing device changes the aspect ratios of the default box Db and the label data Ld. As a result, the information processing device can approximate the shapes of the default box Db and the label data Ld to squares of the same size.
 情報処理装置が行うアスペクト比の変更は、アスペクト比の逆変換を含む。これにより、情報処理装置は、デフォルトボックスDbおよびラベルデータLdの形状を同じ大きさの正方形に近付けることができる。例えば、情報処理装置は、図13に示すデフォルトボックスDbおよびラベルデータLdの場合、デフォルトボックスDbおよびラベルデータLdの縦の長さを√2倍にし、横の長さを1/√2倍にして、デフォルトボックスDb´およびラベルデータLd´を生成する。  The change of the aspect ratio performed by the information processing device includes the inverse conversion of the aspect ratio. As a result, the information processing device can approximate the shapes of the default box Db and the label data Ld to squares of the same size. For example, in the case of the default box Db and label data Ld shown in FIG. to generate a default box Db' and label data Ld'.
 このとき、情報処理装置は、デフォルトボックスDbおよびラベルデータLdの中心位置Pを変化させずに形状変換を行い、デフォルトボックスDb´およびラベルデータLd´を生成する。これにより、情報処理装置は、ラベルデータLdとラベルデータLd´とをマッチングすることで、ラベルデータLd´の配置が多少ずれていても、ラベルデータLd´によって物体の領域のほぼ全体を囲むことができる。 At this time, the information processing device performs shape conversion without changing the center position P of the default box Db and the label data Ld to generate the default box Db' and the label data Ld'. As a result, the information processing device matches the label data Ld and the label data Ld' so that even if the label data Ld' is slightly misaligned, the label data Ld' can surround almost the entire area of the object. can be done.
 デフォルトボックスDbおよびラベルデータLdの形状変換を行う前後で、学習時にラベルデータLd,Ld´と同一形状のデフォルトボックスDb,Db´をラベルデータLd,Ld´に対して横方向にN[pix]移動させてJaccardを算出すると、図14~図15に示す結果が得られた。 Before and after the shape conversion of the default box Db and the label data Ld, the default boxes Db and Db' having the same shape as the label data Ld and Ld' are arranged in the horizontal direction N [pix] with respect to the label data Ld and Ld' during learning. When the Jaccard was calculated by moving, the results shown in FIGS. 14 and 15 were obtained.
 図14に示すように、形状変換前のデフォルトボックスDbおよびラベルデータLdでは、Jaccardが42%であったのに対して、図15に示すように、形状変換後のラベルデータLdとラベルデータLd´では、Jaccardが57%まで向上できる。 As shown in FIG. 14, in the default box Db and label data Ld before shape conversion, Jaccard was 42%. ' can improve Jaccard to 57%.
 情報処理装置は、このようにデフォルトボックスDbおよびラベルデータLdの形状変換を行った上で、学習時にデフォルトボックスマッチングを行うことで、Jaccardがより高いデフォルトボックスDb´をGTに決定して教師データを生成する。 The information processing device performs the shape conversion of the default box Db and the label data Ld in this way, and then performs default box matching at the time of learning. to generate
 また、CNNでは、ネットワークの浅い階層ほど小さな物体を検出し、階層が深くなるほど大きな物体を検出する。そこで、情報処理装置は、上記した学習方法による教師データの生成をCNNの階層毎に行う。 In addition, in CNN, smaller objects are detected in shallower layers of the network, and larger objects are detected in deeper layers. Therefore, the information processing apparatus generates teacher data for each layer of the CNN using the learning method described above.
 そして、情報処理装置は、ネットワークの階層毎に、各階層に対応した教師データを使用して機械学習を行い、学習後のCNNによって画像から物体検出を行う。これにより、情報処理装置は、様々な大きさの物体の検出精度を向上させることができる。情報処理装置は、記憶部28に記憶された情報処理プログラムを実行することによって、上記したCNNの機械学習および物体検出処理を行う。 Then, the information processing device performs machine learning for each layer of the network using teacher data corresponding to each layer, and detects objects from images by CNN after learning. Thereby, the information processing device can improve the detection accuracy of objects of various sizes. The information processing device executes the information processing program stored in the storage unit 28 to perform the above-described CNN machine learning and object detection processing.
 デフォルトボックスDbおよびラベルデータLdの形状変換前の教師データを使用して機械学習したCNNと、形状変換後の教師データを使用して機械学習したCNNとによって、図8および図10と同様の検証を行った結果、図16~図19の結果が得られた。 Verification similar to that of FIGS. 8 and 10 by using CNN machine-learned using teacher data before shape conversion of default box Db and label data Ld and CNN machine-learned using teacher data after shape conversion As a result, the results shown in FIGS. 16 to 19 were obtained.
 図16および図17に示すように、物体およびデフォルトボックスDbの形状が正方形の場合、形状変換前後で物体検出結果のスコアに大きな違いは見られない。一方、図18に示すように、物体およびデフォルトボックスDbの形状が縦長矩形の場合、形状変換前では、物体とデフォルトボックスDbとの横方向のズレ量が増大するほど物体検出結果のスコアが低下している。 As shown in FIGS. 16 and 17, when the shape of the object and default box Db is square, there is no significant difference in the score of the object detection result before and after shape conversion. On the other hand, as shown in FIG. 18, when the shape of the object and the default box Db is a vertically long rectangle, before shape conversion, the score of the object detection result decreases as the amount of horizontal displacement between the object and the default box Db increases. are doing.
 これに対して、図19に示すように、形状変換後では、物体およびデフォルトボックスDbの形状が縦長矩形の場合、物体検出結果のスコアが改善されている。このように、情報処理装置は、本実施形態に係る教師データを使用して機械学習を行うことにより、物体の検出精度を向上させることができる。 On the other hand, as shown in FIG. 19, after shape conversion, when the shape of the object and default box Db is a vertically long rectangle, the score of the object detection result is improved. Thus, the information processing apparatus can improve the object detection accuracy by performing machine learning using the teacher data according to the present embodiment.
[6.効果]
 実施形態に係る教師データ生成方法は、コンピュータが実行する教師データ生成方法であって、画像から抽出される特徴マップFm上に配置されるデフォルトボックスDbおよび画像における物体Tgに付与されるラベルデータLdに対して形状変換を行い、形状変換後のデフォルトボックスDbとラベルデータLdとのマッチングによって、画像のグランドトゥルースGTにするデフォルトボックスDb´を決定して教師データを生成する。これにより、情報処理装置は、実施形態に係る教師データ生成方法によって生成された教師データを使用してCNNを機械学習することによって、物体の検出精度を向上させることができる。
[6. effect]
The training data generation method according to the embodiment is a training data generation method executed by a computer, wherein default boxes Db arranged on a feature map Fm extracted from an image and label data Ld assigned to an object Tg in the image , and by matching the default box Db after the shape conversion with the label data Ld, a default box Db' to be used as the ground truth GT of the image is determined to generate teacher data. As a result, the information processing apparatus can improve object detection accuracy by machine-learning the CNN using the teacher data generated by the teacher data generation method according to the embodiment.
 また、形状変換は、デフォルトボックスDbおよびラベルデータLdのアスペクト比の変更を含む。これにより、情報処理装置は、デフォルトボックスDbおよびラベルデータLdの形状を同じ大きさの正方形に近付けることができる。 Also, the shape conversion includes changing the aspect ratio of the default box Db and the label data Ld. As a result, the information processing device can approximate the shapes of the default box Db and the label data Ld to squares of the same size.
 また、アスペクト比の変更は、アスペクト比の逆変換を含む。これにより、情報処理装置は、デフォルトボックスDbおよびラベルデータLdの形状を同じ大きさの正方形に近付けることができる。 Also, changing the aspect ratio includes inverse conversion of the aspect ratio. As a result, the information processing device can approximate the shapes of the default box Db and the label data Ld to squares of the same size.
 また、デフォルトボックスDbおよびラベルデータLdの中心位置を変化させずに形状変換を行う。これにより、情報処理装置は、ラベルデータLdとラベルデータLd´とをマッチングすることで、ラベルデータLd´の配置が多少ずれていても、ラベルデータLd´によって物体の領域のほぼ全体を囲むことができる。 Also, shape conversion is performed without changing the center positions of the default box Db and the label data Ld. As a result, the information processing device matches the label data Ld and the label data Ld' so that even if the label data Ld' is slightly misaligned, the label data Ld' can surround almost the entire area of the object. can be done.
 また、畳み込みニューラルネットワークの階層毎に、教師データの生成を行う。これにより、情報処理装置は、様々な大きさの物体の検出精度を向上させることができる。 In addition, teacher data is generated for each layer of the convolutional neural network. Thereby, the information processing device can improve the detection accuracy of objects of various sizes.
 また、実施形態に係る教師データ生成プログラムは、画像から抽出される特徴マップFm上に配置されるデフォルトボックスDbおよび画像における物体Tgに付与されるラベルデータLdに対して形状変換を行う手順と、形状変換後のデフォルトボックスDbとラベルデータLdとのマッチングによって、画像のグランドトゥルースにするデフォルトボックスDbを決定して教師データを生成する手順とをコンピュータに実行させる。これにより、コンピュータは、実施形態に係る教師データ生成方法によって生成された教師データを使用してCNNを機械学習することによって、物体の検出精度を向上させることができる。 Further, the training data generation program according to the embodiment includes a procedure of performing shape conversion on the default box Db arranged on the feature map Fm extracted from the image and the label data Ld given to the object Tg in the image; By matching the default box Db after shape conversion with the label data Ld, a default box Db to be used as the ground truth of the image is determined, and a procedure for generating training data is executed by a computer. As a result, the computer can improve object detection accuracy by machine-learning the CNN using the teacher data generated by the teacher data generation method according to the embodiment.
 また、実施形態に係る情報処理装置は、情報処理部を備える。情報処理部は、画像から抽出される特徴マップFm上に配置されるデフォルトボックスDbおよび画像における物体Tgに付与されるラベルデータLdに対して形状変換を行い、形状変換後のデフォルトボックスDbとラベルデータLdとのマッチングによって、画像のグランドトゥルースにするデフォルトボックスDbを決定して教師データを生成し、教師データを使用して畳み込みニューラルネットワークを学習し、畳み込みニューラルネットワークに入力される画像から物体Tgを検出する。これにより、情報処理装置は、物体の検出精度を向上させることができる。 Also, the information processing device according to the embodiment includes an information processing unit. The information processing unit performs shape conversion on the default box Db arranged on the feature map Fm extracted from the image and the label data Ld assigned to the object Tg in the image, and converts the shape-converted default box Db and the label By matching with the data Ld, the default box Db to be the ground truth of the image is determined to generate teacher data, the teacher data is used to learn the convolutional neural network, and the object Tg from the image input to the convolutional neural network to detect Thereby, the information processing device can improve the object detection accuracy.
 また、実施形態に係る情報処理方法は、コンピュータが実行する物体Tg検出方法であって、画像から抽出される特徴マップFm上に配置されるデフォルトボックスDbおよび画像における物体Tgに付与されるラベルデータLdに対して形状変換を行い、形状変換後のデフォルトボックスDbとラベルデータLdとのマッチングによって、画像のグランドトゥルースにするデフォルトボックスDbを決定して教師データを生成し、教師データを使用して畳み込みニューラルネットワークを学習し、畳み込みニューラルネットワークに入力される画像から物体Tgを検出する。これにより、コンピュータは、物体の検出精度を向上させることができる。 Further, the information processing method according to the embodiment is an object Tg detection method executed by a computer, and includes a default box Db arranged on a feature map Fm extracted from an image and label data given to the object Tg in the image. Shape transformation is performed on Ld, and by matching the default box Db after the shape transformation with the label data Ld, a default box Db to be the ground truth of the image is determined to generate teacher data, and the teacher data is used A convolutional neural network is learned, and an object Tg is detected from an image input to the convolutional neural network. This allows the computer to improve the object detection accuracy.
 また、実施形態に係る情報処理プログラムは、画像から抽出される特徴マップFm上に配置されるデフォルトボックスDbおよび画像における物体Tgに付与されるラベルデータLdに対して形状変換を行う手順と、形状変換後のデフォルトボックスDbとラベルデータLdとのマッチングによって、画像のグランドトゥルースにするデフォルトボックスDbを決定して教師データを生成する手順と、教師データを使用して畳み込みニューラルネットワークを学習する手順と、畳み込みニューラルネットワークに入力される画像から物体Tgを検出する手順とをコンピュータに実行させる。これにより、コンピュータは、物体の検出精度を向上させることができる。 Further, the information processing program according to the embodiment includes a procedure for performing shape conversion on the default box Db arranged on the feature map Fm extracted from the image and the label data Ld given to the object Tg in the image; A procedure for determining a default box Db to be the ground truth of an image by matching the converted default box Db and the label data Ld to generate teacher data, and a procedure for learning a convolutional neural network using the teacher data. , and detecting the object Tg from the image input to the convolutional neural network. This allows the computer to improve the object detection accuracy.
 なお、本明細書に記載された効果はあくまで例示であって限定されるものでは無く、また他の効果があってもよい。 It should be noted that the effects described in this specification are only examples and are not limited, and other effects may also occur.
 なお、本技術は以下のような構成も取ることができる。
(1)
 コンピュータが実行する教師データ生成方法であって、
 画像から抽出される特徴マップ上に配置されるデフォルトボックスおよび前記画像における物体に付与されるラベルデータに対して形状変換を行い、
 前記形状変換後の前記デフォルトボックスとラベルデータとのマッチングによって、前記画像のグランドトゥルースにするデフォルトボックスを決定して教師データを生成する
 ことを含む教師データ生成方法。
(2)
 前記形状変換は、
 前記デフォルトボックスおよび前記ラベルデータのアスペクト比の変更
 を含む(1)に記載の教師データ生成方法。
(3)
 前記アスペクト比の変更は、
 前記アスペクト比の逆変換
 を含む(2)に記載の教師データ生成方法。
(4)
 前記デフォルトボックスおよび前記ラベルデータの中心位置を変化させずに前記形状変換を行う
 ことを含む(1)から(3)のいずれか一つに記載の教師データ生成方法。
(5)
 畳み込みニューラルネットワークの階層毎に、前記教師データの生成を行う
 ことを含む(1)から(4)のいずれか一つに記載の教師データ生成方法。
(6)
 画像から抽出される特徴マップ上に配置されるデフォルトボックスおよび前記画像における物体に付与されるラベルデータに対して形状変換を行う手順と、
 前記形状変換後の前記デフォルトボックスとラベルデータとのマッチングによって、前記画像のグランドトゥルースにするデフォルトボックスを決定して教師データを生成する手順と
 をコンピュータに実行させる教師データ生成プログラム。
(7)
 画像から抽出される特徴マップ上に配置されるデフォルトボックスおよび前記画像における物体に付与されるラベルデータに対して形状変換を行い、
 前記形状変換後の前記デフォルトボックスとラベルデータとのマッチングによって、前記画像のグランドトゥルースにするデフォルトボックスを決定して教師データを生成し、
 前記教師データを使用して畳み込みニューラルネットワークを学習し、
 前記畳み込みニューラルネットワークに入力される画像から物体を検出する情報処理部
 を備える情報処理装置。
(8)
 コンピュータが実行する情報処理方法であって、
 画像から抽出される特徴マップ上に配置されるデフォルトボックスおよび前記画像における物体に付与されるラベルデータに対して形状変換を行い、
 前記形状変換後の前記デフォルトボックスとラベルデータとのマッチングによって、前記画像のグランドトゥルースにするデフォルトボックスを決定して教師データを生成し、
 前記教師データを使用して畳み込みニューラルネットワークを学習し、
 前記畳み込みニューラルネットワークに入力される画像から物体を検出する
 ことを含む情報処理方法。
(9)
 画像から抽出される特徴マップ上に配置されるデフォルトボックスおよび前記画像における物体に付与されるラベルデータに対して形状変換を行う手順と、
 前記形状変換後の前記デフォルトボックスとラベルデータとのマッチングによって、前記画像のグランドトゥルースにするデフォルトボックスを決定して教師データを生成する手順と、
 前記教師データを使用して畳み込みニューラルネットワークを学習する手順と、
 前記畳み込みニューラルネットワークに入力される画像から物体を検出する手順と
 をコンピュータに実行させる情報処理プログラム。
Note that the present technology can also take the following configuration.
(1)
A training data generation method executed by a computer,
performing shape transformation on a default box placed on a feature map extracted from an image and label data given to an object in the image;
A teacher data generation method, comprising: determining a default box to be ground truth of the image by matching the default box after the shape transformation and label data, and generating teacher data.
(2)
The shape transformation is
The teacher data generating method according to (1), including changing aspect ratios of the default box and the label data.
(3)
Changing the aspect ratio includes:
The teacher data generation method according to (2), including inverse transformation of the aspect ratio.
(4)
The teacher data generation method according to any one of (1) to (3), including performing the shape conversion without changing the center positions of the default box and the label data.
(5)
The teacher data generating method according to any one of (1) to (4), including generating the teacher data for each layer of the convolutional neural network.
(6)
a procedure of shape transformation for a default box placed on a feature map extracted from an image and label data given to an object in the image;
A training data generation program for causing a computer to execute a procedure of determining a default box to be the ground truth of the image by matching the default box after the shape transformation with the label data and generating training data.
(7)
performing shape transformation on a default box placed on a feature map extracted from an image and label data given to an object in the image;
determining a default box to be the ground truth of the image by matching the default box after the shape transformation and the label data to generate teacher data;
training a convolutional neural network using the training data;
An information processing apparatus comprising an information processing unit that detects an object from an image input to the convolutional neural network.
(8)
A computer-executed information processing method comprising:
performing shape transformation on a default box placed on a feature map extracted from an image and label data given to an object in the image;
determining a default box to be the ground truth of the image by matching the default box after the shape transformation and the label data to generate teacher data;
training a convolutional neural network using the training data;
An information processing method comprising detecting an object from an image input to the convolutional neural network.
(9)
a procedure of shape transformation for a default box placed on a feature map extracted from an image and label data given to an object in the image;
a step of determining a default box to be the ground truth of the image by matching the default box after the shape transformation and the label data to generate training data;
training a convolutional neural network using the training data;
An information processing program for causing a computer to execute a procedure for detecting an object from an image input to the convolutional neural network.
 Pa 入力画像
 Pb ラベルデータ付画像
 Fm 特徴マップ
 Db,Db´ デフォルトボックス
 Ld,Ld´ ラベルデータ
Pa input image Pb image with label data Fm feature map Db, Db' default box Ld, Ld' label data

Claims (9)

  1.  コンピュータが実行する教師データ生成方法であって、
     画像から抽出される特徴マップ上に配置されるデフォルトボックスおよび前記画像における物体に付与されるラベルデータに対して形状変換を行い、
     前記形状変換後の前記デフォルトボックスとラベルデータとのマッチングによって、前記画像のグランドトゥルースにするデフォルトボックスを決定して教師データを生成する
     ことを含む教師データ生成方法。
    A training data generation method executed by a computer,
    performing shape transformation on a default box placed on a feature map extracted from an image and label data given to an object in the image;
    A teacher data generation method, comprising: determining a default box to be ground truth of the image by matching the default box after the shape transformation and label data, and generating teacher data.
  2.  前記形状変換は、
     前記デフォルトボックスおよび前記ラベルデータのアスペクト比の変更
     を含む請求項1に記載の教師データ生成方法。
    The shape transformation is
    2. The teaching data generation method according to claim 1, further comprising: changing aspect ratios of said default box and said label data.
  3.  前記アスペクト比の変更は、
     前記アスペクト比の逆変換
     を含む請求項2に記載の教師データ生成方法。
    Changing the aspect ratio includes:
    3. The teacher data generation method according to claim 2, further comprising inverse transformation of the aspect ratio.
  4.  前記デフォルトボックスおよび前記ラベルデータの中心位置を変化させずに前記形状変換を行う
     ことを含む請求項1に記載の教師データ生成方法。
    2. The teaching data generating method according to claim 1, further comprising performing said shape conversion without changing center positions of said default box and said label data.
  5.  畳み込みニューラルネットワークの階層毎に、前記教師データの生成を行う
     ことを含む請求項1に記載の教師データ生成方法。
    The teacher data generation method according to claim 1, comprising: generating the teacher data for each layer of the convolutional neural network.
  6.  画像から抽出される特徴マップ上に配置されるデフォルトボックスおよび前記画像における物体に付与されるラベルデータに対して形状変換を行う手順と、
     前記形状変換後の前記デフォルトボックスとラベルデータとのマッチングによって、前記画像のグランドトゥルースにするデフォルトボックスを決定して教師データを生成する手順と
     をコンピュータに実行させる教師データ生成プログラム。
    a procedure of shape transformation for a default box placed on a feature map extracted from an image and label data given to an object in the image;
    A training data generation program for causing a computer to execute a procedure of determining a default box to be ground truth of the image by matching the default box after the shape transformation with the label data and generating training data.
  7.  画像から抽出される特徴マップ上に配置されるデフォルトボックスおよび前記画像における物体に付与されるラベルデータに対して形状変換を行い、
     前記形状変換後の前記デフォルトボックスとラベルデータとのマッチングによって、前記画像のグランドトゥルースにするデフォルトボックスを決定して教師データを生成し、
     前記教師データを使用して畳み込みニューラルネットワークを学習し、
     前記畳み込みニューラルネットワークに入力される画像から物体を検出する情報処理部
     を備える情報処理装置。
    performing shape transformation on a default box placed on a feature map extracted from an image and label data given to an object in the image;
    determining a default box to be the ground truth of the image by matching the default box after the shape transformation and the label data to generate teacher data;
    training a convolutional neural network using the training data;
    An information processing apparatus comprising an information processing unit that detects an object from an image input to the convolutional neural network.
  8.  コンピュータが実行する情報処理方法であって、
     画像から抽出される特徴マップ上に配置されるデフォルトボックスおよび前記画像における物体に付与されるラベルデータに対して形状変換を行い、
     前記形状変換後の前記デフォルトボックスとラベルデータとのマッチングによって、前記画像のグランドトゥルースにするデフォルトボックスを決定して教師データを生成し、
     前記教師データを使用して畳み込みニューラルネットワークを学習し、
     前記畳み込みニューラルネットワークに入力される画像から物体を検出する
     ことを含む情報処理方法。
    A computer-executed information processing method comprising:
    performing shape transformation on a default box placed on a feature map extracted from an image and label data given to an object in the image;
    determining a default box to be the ground truth of the image by matching the default box after the shape transformation and the label data to generate teacher data;
    training a convolutional neural network using the training data;
    An information processing method comprising detecting an object from an image input to the convolutional neural network.
  9.  画像から抽出される特徴マップ上に配置されるデフォルトボックスおよび前記画像における物体に付与されるラベルデータに対して形状変換を行う手順と、
     前記形状変換後の前記デフォルトボックスとラベルデータとのマッチングによって、前記画像のグランドトゥルースにするデフォルトボックスを決定して教師データを生成する手順と、
     前記教師データを使用して畳み込みニューラルネットワークを学習する手順と、
     前記畳み込みニューラルネットワークに入力される画像から物体を検出する手順と
     をコンピュータに実行させる情報処理プログラム。
    a procedure of shape transformation for a default box placed on a feature map extracted from an image and label data given to an object in the image;
    a step of determining a default box to be the ground truth of the image by matching the default box after the shape transformation and the label data to generate training data;
    training a convolutional neural network using the training data;
    An information processing program for causing a computer to execute a procedure for detecting an object from an image input to the convolutional neural network.
PCT/JP2022/041036 2021-11-09 2022-11-02 Teaching data generation method, teaching data generation program, information processing device, information processing method and information processing program WO2023085190A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023559595A JPWO2023085190A1 (en) 2021-11-09 2022-11-02

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021182769 2021-11-09
JP2021-182769 2021-11-09

Publications (1)

Publication Number Publication Date
WO2023085190A1 true WO2023085190A1 (en) 2023-05-19

Family

ID=86335919

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/041036 WO2023085190A1 (en) 2021-11-09 2022-11-02 Teaching data generation method, teaching data generation program, information processing device, information processing method and information processing program

Country Status (2)

Country Link
JP (1) JPWO2023085190A1 (en)
WO (1) WO2023085190A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027532A (en) * 2019-12-11 2020-04-17 上海眼控科技股份有限公司 System and method for identifying tax amount of insurance policy vehicle and ship for forced insurance
CN111163628A (en) * 2017-05-09 2020-05-15 蓝河技术有限公司 Automatic plant detection using image data
US20200258313A1 (en) * 2017-05-26 2020-08-13 Snap Inc. Neural network-based image stream modification
CN112446231A (en) * 2019-08-27 2021-03-05 丰图科技(深圳)有限公司 Pedestrian crossing detection method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111163628A (en) * 2017-05-09 2020-05-15 蓝河技术有限公司 Automatic plant detection using image data
US20200258313A1 (en) * 2017-05-26 2020-08-13 Snap Inc. Neural network-based image stream modification
CN112446231A (en) * 2019-08-27 2021-03-05 丰图科技(深圳)有限公司 Pedestrian crossing detection method and device, computer equipment and storage medium
CN111027532A (en) * 2019-12-11 2020-04-17 上海眼控科技股份有限公司 System and method for identifying tax amount of insurance policy vehicle and ship for forced insurance

Also Published As

Publication number Publication date
JPWO2023085190A1 (en) 2023-05-19

Similar Documents

Publication Publication Date Title
US10754347B2 (en) Vehicle control device
US20190347492A1 (en) Vehicle control device
US11507110B2 (en) Vehicle remote assistance system, vehicle remote assistance server, and vehicle remote assistance method
WO2021241189A1 (en) Information processing device, information processing method, and program
US11556127B2 (en) Static obstacle map based perception system
US20240054793A1 (en) Information processing device, information processing method, and program
WO2019150918A1 (en) Information processing device, information processing method, program, and moving body
US20210362727A1 (en) Shared vehicle management device and management method for shared vehicle
JP2022098397A (en) Device and method for processing information, and program
JP2018180641A (en) Vehicle identification device
WO2023153083A1 (en) Information processing device, information processing method, information processing program, and moving device
WO2023085190A1 (en) Teaching data generation method, teaching data generation program, information processing device, information processing method and information processing program
US20230245423A1 (en) Information processing apparatus, information processing method, and program
WO2023085017A1 (en) Learning method, learning program, information processing device, information processing method, and information processing program
JP7491267B2 (en) Information processing server, processing method for information processing server, and program
WO2023149089A1 (en) Learning device, learning method, and learning program
WO2023162497A1 (en) Image-processing device, image-processing method, and image-processing program
WO2023090001A1 (en) Information processing device, information processing method, and program
WO2023063145A1 (en) Information processing device, information processing method, and information processing program
WO2024024471A1 (en) Information processing device, information processing method, and information processing system
WO2022075039A1 (en) Information processing device, information processing system, and information processing method
KR102388625B1 (en) Autonomous vehicle for field learning with artificial intelligence applied
WO2023145460A1 (en) Vibration detection system and vibration detection method
WO2023054090A1 (en) Recognition processing device, recognition processing method, and recognition processing system
WO2023074419A1 (en) Information processing device, information processing method, and information processing system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22892689

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023559595

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE