WO2018066352A1

WO2018066352A1 - Image generation system, program and method, and simulation system, program and method

Info

Publication number: WO2018066352A1
Application number: PCT/JP2017/033729
Authority: WO
Inventors: 福原　隆浩; 中西　康之; 隆河原
Original assignee: 株式会社アドバンスド・データ・コントロールズ
Priority date: 2016-10-06
Filing date: 2017-09-19
Publication date: 2018-04-12

Abstract

[Problem] To generate, with CG synthesis techniques, near infrared sensor or LiDAR laser light sensor images extremely similar to actually imaged images, and to use said CG-synthesized images to make it possible to run simulations of a recognition function module on images which change with displacement in vehicle position information. [Solution] This system uses computer graphics techniques to generate a virtual sensor image. The computer graphics include: a means for creating a scenario of an object present in the image; a means for performing modeling for each object in the computer graphics on the basis of a scenario; a means for performing shading for each model of the modeling result; a means for outputting only one component of a shaded image; and a means for generating a depth image on the basis of three-dimensional shape information for each object in the computer graphics.

Description

Image generation system, program and method, and simulation system, program and method

The present invention generates a virtual image of a near-infrared sensor or a LiDAR laser light sensor, and uses the virtual image to simulate a recognition function module for an image that changes with displacement of vehicle position information, a simulation program, and The present invention relates to a simulation method.

Currently, in order to realize automatic driving of automobiles such as ADAS (advanced driver assistance system) that detects and avoids the possibility of accidents in advance, image recognition technology is used for the actual video mounted on the vehicle. There are many active tests that automatically detect and detect objects such as other vehicles, pedestrians, traffic lights, etc., and perform control such as speed reduction and avoidance. In the above experimental system, it is particularly important that the real-time property, the high recognition rate, and the whole are controlled synchronously.

As a conventional example related to an automatic driving support system, for example, there is a traveling control system disclosed in Patent Document 1. In the travel control system disclosed in Patent Document 1, road markings such as lane markers and stop positions of roads existing around the host vehicle are detected, and a plurality of moving objects, obstacles, etc. existing around the host vehicle are detected. The purpose is to realize an automatic driving system that detects a three-dimensional object, determines a traveling area on the road, and avoids a collision with a three-dimensional object such as a traffic light or a sign while traveling on a preset traveling route It is said.

By the way, in order to recognize and control the situation around the outside with an in-vehicle sensor, it is possible to determine a plurality of obstacles around the host vehicle or a plurality of types of moving bodies such as vehicles, bicycles, pedestrians, and their positions and speeds. It is necessary to detect the information. Furthermore, it is necessary to determine the meaning of the paint such as the lane marker and the stop line on the road and the meaning of the sign when the vehicle travels. As such an in-vehicle sensor that detects external information around the host vehicle, an image recognition technique using an image sensor of a camera has been considered effective.

JP 2016-99635 A

In order to realize automatic driving of the vehicle, the vehicle itself needs to recognize the surrounding environment. For this purpose, it is necessary to accurately measure the distance from the vehicle itself to surrounding objects. The following technologies have been developed for distance measurement, and commercial vehicles that already have these sensors installed to realize driving assist technologies such as lane keeping, cruise control, and automatic braking. There are also many.

・ Stereo camera: The distance is calculated by using the principle of triangulation by using two cameras in the same way as the human eye ・ Infrared depth sensor: Irradiation of an infrared pattern, and the reflection is photographed with an infrared camera Calculate the distance from the deviation (phase difference) of the sensor.-Ultrasonic sensor: Calculate the distance from the time required to transmit the ultrasonic wave and receive the reflected wave.-Millimeter wave radar: With the same mechanism as the ultrasonic wave. Calculate distance from time taken to transmit millimeter wave radar and receive reflected wave ・ LiDAR (Light Detection and Ranging): This is the same mechanism as ultrasonic sensor and millimeter wave radar. use. The distance is calculated from the time taken to receive the reflected wave (TOF: Time of Flight). As described above, there are a plurality of methods, but each has advantages and disadvantages. In the case of a stereo camera, it is easy to accurately measure the distance by stereoscopic vision, but it is necessary to separate at least 30 cm between the two cameras, and there is a limit to downsizing.

¡Infrared ray depth sensors and ultrasonic sensors are excellent in terms of inexpensiveness, but they are greatly attenuated by distance. Therefore, when the distance to the object is several tens of meters or more, accurate measurement is difficult or measurement itself is impossible. In that respect, millimeter-wave radar and LiDAR are not easily attenuated even over long distances, and therefore high-precision measurements are possible over long distances. However, there is a problem that the device is expensive and difficult to downsize, but it is considered that the mounting on the vehicle will be accelerated by the progress of research and development in the future.

From the above, in order to accurately measure the distance from the short distance to the long distance, it can be said that it is a realistic means at this time to selectively use different sensors.
As applications other than automatic driving of automobiles, technologies for detecting head movements, gesture motion detection, and obstacle avoidance of automatic driving robots are promising to prevent the driver from falling asleep in the automobile. .

By the way, it is indispensable for future automatic driving to collect a large number of actual images taken by various sensors as described above and to increase the image recognition rate by using deep learning recognition technology.

However, it is practically impossible to collect test data by actually driving the vehicle indefinitely, and it is important how the above verification can be performed with a reality level that can actually be substituted. It becomes a problem. For example, when recognizing the external environment using image recognition technology for camera images, the recognition rate changes significantly depending on external factors such as the weather around the vehicle (rainy weather, fog, etc.) and the time zone (nighttime, dim, backlight, etc.). It will affect the detection result. As a result, false detection and non-detection by the image recognition means increase regarding detection of moving objects, obstacles, and paint on the road around the host vehicle. Such false detection or non-detection by the image recognition means can be solved by increasing the number of learning samples by using the deep learning (machine learning) technique with the highest recognition rate.

However, there is a limit to extracting learning samples while actually driving on the road, and vehicle driving tests and sample collection are performed after severe weather conditions such as rainy weather, backlighting, and fog are met. Is not realistic as a development method because it is difficult to reproduce the conditions and the opportunity is rare.

On the other hand, it can be said that the above-mentioned image recognition alone is not sufficient for the realization of a fully automatic driving in the future. This is because a camera image is a two-dimensional image, and a vehicle, a pedestrian, a traffic light, etc. can extract an object by image recognition, but cannot detect a distance for each pixel to the object. Therefore, a sensor using LiDAR laser light and a sensor using near infrared rays are promising to meet these requirements. Therefore, by combining a plurality of different types of sensors, it is possible to significantly improve the safety during driving of the automobile.

Therefore, the present invention solves the above problems, and relates to improvement of recognition rate for other vehicles around the host vehicle, obstacles on the road, objects such as pedestrians, An object is to improve the reality of vehicle running tests and sample collection by artificially generating images that are very similar to live-action images under conditions that are difficult to reproduce. Another object of the present invention is to construct a plurality of different types of sensors in a virtual environment and generate each video using CG technology. Furthermore, it aims at providing the simulation system, simulation program, and simulation method of the synchronous control using the produced | generated CG image.

In order to solve the above problems, the present invention provides a system, a program, and a method for generating a virtual image input to a sensor means as computer graphics,
A scenario creation unit for creating a scenario relating to the arrangement and behavior of an object present in the virtual image;
A 3D modeling unit that performs modeling for each object based on the scenario;
A 3D shading unit that performs shading for each model generated by the modeling unit and generates a shading image for each model;
A component extraction unit that extracts and outputs a predetermined component included in the shading image as a component image;
A depth image generation unit that generates a depth image in which a depth is defined based on information on a three-dimensional shape of each object in the component image;
It is provided with.

In the above invention, the component is preferably an R component of the RGB image.
Moreover, in the said invention, it is preferable to further provide the gray scale conversion part which makes the said component gray scale.

The present invention is a system, program, and method for generating a virtual image input to a sensor means as computer graphics,
A scenario creation unit for creating a scenario relating to the arrangement and behavior of an object present in the virtual image;
A 3D modeling unit that performs modeling for each object based on the scenario;
A 3D shading unit that performs shading for each model generated by the modeling unit and generates a shading image for each model;
A depth image generation unit that generates a depth image in which a depth is defined based on information on a three-dimensional shape of each object;
With
The shading part is
A function of performing shading only on a predetermined portion of the model from which light rays emitted from the sensor means are reflected;
A function of outputting only the three-dimensional shape of the predetermined part,
The depth image generation unit generates a depth image for each object based on information on a three-dimensional shape of the predetermined part.

In the above invention, the sensor means is preferably a near infrared sensor. In the above invention, the sensor means is preferably a LiDAR sensor that detects reflected light of the irradiated laser light.

In the above invention, the scenario creating means includes means for determining three-dimensional shape information of the object, operation information of the object, material information of the object, parameter information of the light source, camera position information, and sensor position information. It is preferable.

In the above invention, a one-component image and a depth image based on a real image are acquired as teacher data, and a neural network is obtained by backpropagation based on the component image and the depth image generated by the depth image generation unit and the teacher data. It is preferable to further include deep learning recognition learning means for performing the training.

In the above invention, an irradiation image and a depth image based on a live-action image are acquired as teacher data, and an image obtained as a result of shading by the shading unit, the depth image generated by the depth image generation unit, and a background based on the teacher data It is preferable to provide deep learning recognition learning means for training a neural network by propagation.

In the above invention, from the depth image generated by the depth image generation unit, a TOF calculation unit that calculates the time required for receiving the reflected wave from the irradiation of light rays as TOF information,
A distance image generation unit that generates a distance image based on the TOF information by the TOF calculation unit;
It is preferable to further include a comparison and evaluation unit that compares the degree of coincidence between the distance image generated by the distance image generation unit and the depth image generated by the depth image generation unit.

In the above invention, it is preferable that the modeling unit has a function of acquiring a comparison result by the comparative evaluation unit as feedback information, adjusting a condition in the modeling based on the acquired feedback information, and performing modeling again.

In the above-described invention, it is preferable that the feedback information acquisition based on the modeling and the comparison is repeatedly performed until the matching error in the comparison result by the comparison evaluation unit becomes smaller than a predetermined threshold value.

Furthermore, the present invention is a simulation system, program, and method of a recognition function module for an image that changes with displacement of vehicle position information,
Position information acquisition means for acquiring position information of the vehicle relative to surrounding objects based on a detection result by the sensor means;
Based on the position information acquired by the position information acquisition means, an image generation means for generating an image for simulation that reproduces the area specified by the position information;
Image recognition means for recognizing and detecting a specific object from the simulation image generated by the image generation means using the recognition function module;
Position information calculation means for generating a control signal for controlling the operation of the vehicle using the recognition result in the image recognition means, and for changing / correcting the position information of the own vehicle based on the generated control signal;
Position information acquisition means, the image generation means, the image recognition means, and synchronization control means for synchronously controlling the position information calculation means are provided.

In the above invention, the synchronization control means includes:
Means for packetizing and sending the position information in a specific format;
Means for transmitting packetized data via a network or a transmission bus in a specific device;
Means for receiving and de-packetizing the packet data;
It is preferable to further comprise means for inputting the de-packetized data and generating an image.

In the above invention, it is preferable that the synchronization control means transmits and receives signals transmitted and received between the means by using UDP (User Datagram Protocol).

The vehicle position information preferably includes any of XYZ coordinates of the road surface absolute position coordinates of the vehicle, road surface absolute position coordinates XYZ coordinates of the tire, vehicle Euler angles, and wheel rotation angles.
In the above invention, the image generation means preferably includes means for synthesizing the three-dimensional shape of the vehicle by computer graphics.

In the above invention, the vehicle is set for a plurality of vehicles, and the recognition function module is operated for each vehicle,
The position information calculation means uses the information of the recognition result by the recognition means to change / correct the position information of each vehicle for a plurality of vehicles,
Preferably, the synchronization control means executes synchronization control for the plurality of vehicles with respect to the position information acquisition means, the image generation means, the image recognition means, and the position information calculation means.

In the above invention, the image generating means preferably includes means for generating a different image for each sensor means.
Moreover, in the said invention, it is preferable to provide any one or all of an image sensor means, a LiDAR sensor, a millimeter wave sensor, and an infrared sensor as said sensor means.

In the above invention, the simulation system includes means for generating images corresponding to a plurality of sensors, and also includes recognition means corresponding to each generated image, and using the plurality of recognition results, It is preferable that a means for performing the synchronization control is provided.

Furthermore, in the invention relating to the simulation system, the program and the method, the invention of the image generation system, the image generation program and the image generation method is provided as the image generation means,
It is preferable that the depth image generated by the depth image generation unit of the image generation system is input to the image recognition unit as the simulation image.

As described above, according to these inventions, when learning a recognition function module such as deep learning (machine learning), a learning sample is generated by artificially generating an image very similar to a real image such as CG. The number can be increased, and the recognition rate can be improved by increasing the efficiency of learning.

More specifically, the present invention uses a means for generating and synthesizing a highly realistic CG image that is very similar to a live-action image based on a simulation model based on the displacement of the vehicle position information. It is possible to artificially generate an infinite number of images to which a non-existent environment or light source is added. This generated CG image can be input to the recognition function module in the same way as a conventional camera image, and the camera image can be processed in the same manner to test whether the target object can be recognized and extracted. In addition, it is possible to learn with the types of images that have been difficult or impossible to acquire and shoot until now, and further increase the learning efficiency and improve the recognition rate.

In addition to image sensors that can acquire two-dimensional images, different types of sensors such as LiDAR (laser light) and millimeter waves that can extract information on the three-dimensional shape of an object are used together to generate images of these sensors. Thus, not only can a wider range of tests be possible, but a synergistic effect can be expected that will also enable the recognition technology to be improved at the same time.

Note that the application field of the present invention includes an experimental apparatus, a simulator, and a software module and hardware device (for example, a camera mounted on a vehicle, an image sensor, and a three-dimensional shape around the vehicle for automatic driving operation of an automobile. Machine learning software such as deep learning, deep learning, and so on. In addition, according to the present invention, since the CG technology that realizes a real image in real time and the synchronization technology of synchronization control are provided, it can be widely applied in fields other than the automatic driving operation of automobiles. For example, surgical simulators, military simulators, safe driving tests such as robots and drones, etc. are promising as fields of use.

1 is a block diagram illustrating an overall configuration of an image generation system that generates a virtual image according to a first embodiment. It is explanatory drawing which shows the process which actually drive | works the actual road and collects three-dimensional data It is explanatory drawing which shows collecting three-dimensional shape about a test vehicle. It is a gray scale image acquired with a near infrared sensor. This is a distance image acquired by a near infrared sensor. It is a block diagram which shows the whole structure of the image generation system which produces | generates the virtual image which concerns on 2nd Embodiment. It is explanatory drawing which showed TOF of the laser beam It is explanatory drawing which shows the structure and operation | movement principle of LiDAR. It is explanatory drawing which shows the beam irradiation of LiDAR. It is explanatory drawing explaining a mode that the laser beam of LiDAR is irradiated to a target object. It is a block diagram explaining the neural network and backpropagation concerning a 3rd embodiment. It is explanatory drawing explaining a neural network. It is a block diagram explaining the image quality evaluation system of the depth image which concerns on 4th Embodiment. It is explanatory drawing explaining the concept of TOF and the relationship between a light projection pulse and a light reception pulse. It is a block diagram explaining the synchronous simulation system which concerns on 5th Embodiment. It is a block diagram explaining the structure by the side of a client It is a block diagram explaining the structure by the side of a simulator server It is a flowchart explaining each structure of UDP synchronous control, image generation, and image recognition. It is a flowchart figure which shows operation | movement of this synchronous control simulator. It is a block diagram explaining the multiple synchronous simulation system which concerns on 6th Embodiment. It is a block diagram explaining a plurality of UDP synchronous control systems. It is a block diagram explaining the multiple deep learning recognition means which concerns on 7th Embodiment. It is a block diagram explaining the multiple deep learning recognition means provided with the material imaging | photography means.

[First Embodiment]
(Overall configuration of near-infrared virtual image generation system)
Exemplary embodiments of a near-infrared virtual image generation system according to the present invention will be described below in detail with reference to the accompanying drawings. In the present embodiment, a system that generates an image very similar to a live-action image using CG technology is constructed in order to replace the real-image image taken from various types of sensors essential for automatic driving. FIG. 1 is a block diagram for generating a near-infrared virtual image.

Note that the near-infrared virtual image generation system according to the present embodiment is configured such that various virtual modules are constructed on an arithmetic processing device such as a CPU provided in a computer by executing software installed in the computer, for example. This system is implemented. In the following description, “module” refers to a functional unit that is configured by hardware such as an apparatus or a device, software having the function, or a combination thereof, and achieves a predetermined operation.

As shown in FIG. 1, the near-infrared virtual image generation system according to the present embodiment includes a scenario creation unit 10, a 3D modeling unit 11, a 3D shading unit 12, an R image grayscale conversion unit 13, and a depth image generation unit 14. ing.

The scenario creation unit 10 is a means for creating scenario data indicating what CG is to be created. The scenario creation unit 10 includes means for determining three-dimensional shape information of an object, operation information of the object, material information of the object, parameter information of the light source, camera position information, and sensor position information. For example, in the case of CG used for automatic driving, many objects such as roads, buildings, vehicles, pedestrians, bicycles, roadside belts, traffic lights, etc. exist in the virtual space. It is defined in which position (coordinate, altitude) the object is located, in what direction and how it moves, the position of the virtual camera (viewpoint) in the virtual space, the type of light source, This is data that defines the number, the position and orientation of each, and the movement and behavior of the object in the virtual space.

The scenario creation unit 10 first determines what CG image is to be generated. According to the scenario set by the scenario creation unit 10, the 3D modeling unit 11 generates a 3D image.

The 3D modeling unit 11 is a module that creates the shape of an object in a virtual space, and sets the coordinates of each vertex constituting the outer shape of the object and the shape of the internal configuration, and an equation that expresses the boundary line / surface of the shape The three parameters are set and a three-dimensional object shape is constructed. Specifically, the 3D modeling unit 11 models information such as a 3D shape of a road, a 3D shape of a vehicle traveling on the road, and a 3D shape of a pedestrian.

The 3D shading unit 12 is a module that generates an actual 3DCG using each 3D model data D101 generated by the 3D modeling unit 11. The 3D shading unit 12 expresses a shadow of an object realized by 3DCG by shading processing, and A stereoscopic and realistic image is generated according to the position and the intensity of light.

The R image gray scale conversion unit 13 is a component extraction unit that extracts a predetermined component included in the shading image transmitted from the 3D shading unit 12, and as a gray scale conversion unit that converts the extracted component image into a gray scale. It is a functioning module. Specifically, the R image gray scale conversion unit 13 extracts a component of the R component in the shading image D103 that is an RGB image transmitted from the 3D shading unit 12, as a component image, and extracts the R component of the extracted R component. As shown in FIG. 4, the component image is converted into a gray scale, and a gray scale image D104 (Img (x, y), x: horizontal coordinate value, y: vertical coordinate value) is output. As a result, only the R (red) component in the shading image D103 is extracted, and an image very close to an infrared image is generated. FIG. 4 is a black-and-white image obtained by converting a real image obtained by photographing a room with a near-infrared sensor into a gray scale.

The depth image generation unit 14 acquires 3D shape data of each object on the screen based on the modeling information D102 of each 3D shape model input from the 3D shading unit 12, and based on the distance to each object. This is a module that generates a depth image 105 (also called depth-map). FIG. 5 is an image in which the above depth image is color-coded according to distance. The red component is stronger as the object is further forward, and the blue component is stronger as it is farther away. Those in the middle position change from yellow to green, and depth information regarding all objects in the screen can be obtained.

(Near-infrared virtual image generation system operation)
The near-infrared virtual image generation method of the present invention can be implemented by operating the near-infrared virtual image generation system having the above configuration.

First, the scenario creation unit 10 creates a scenario of what CG is to be created. For example, in the case of a CG used for automatic driving, many objects such as roads, buildings, vehicles, pedestrians, bicycles, roadside belts, traffic lights, etc. exist in which position, in what direction, and in what direction. In addition, a scenario such as the position of the camera, the type and number of light sources is created.

The scenario creation unit 10 determines what CG image to generate. Next, information such as the 3D shape of the road, the 3D shape of the vehicle traveling on the road, and the 3D shape of the pedestrian is modeled along the scenario set by the scenario creating unit 10. As the modeling means, for example, a road can be easily realized by using a “high-precision map database”, and a large number of vehicles equipped with an in-vehicle device 1b as shown in FIG. As shown in (b), the map is converted into 3D from the collected data, and as shown in (c), each road feature is linked into a database using a vectorized drawing.

Next, the 3D modeling unit 11 acquires or generates a required 3D shape model of each target object based on the scenario information D100 created by the scenario creation unit 10. Then, the 3D shading unit 12 generates an actual 3DCG using each 3D model data D101 generated by the 3D modeling unit 11.

Further, the R component shading image D103 sent from the 3D shading unit 12 is converted into an R image gray scale as shown in FIG. 4 to obtain a gray scale image D104 (Img (x, y), x: horizontal coordinate value, y: vertical coordinate value) is output. On the other hand, modeling information D102 of each 3D shape model is obtained from the 3D shading unit 12, and 3D shape data of each object in the screen is obtained from these pieces of information. Based on this, the depth image generation unit 14 is obtained. , A depth image D105 (α (x, y), x: horizontal coordinate value, y: vertical coordinate value) is generated.

Then, the grayscale image D104 and the depth image D105 obtained by the above operation are output as output images of the present embodiment, and these two image outputs are used for image recognition.

[Second Embodiment]
Hereinafter, a second embodiment of a system according to the present invention will be described in detail with reference to the accompanying drawings. In the present embodiment, the same components as those in the first embodiment described above are denoted by the same reference numerals, and the functions and the like are the same unless otherwise specified, and the description thereof is omitted.

(Overall configuration of LiDAR sensor virtual image generation system)
In the present embodiment, a case where a LiDAR sensor is used will be described. The system according to the present embodiment is realized in FIG. 6, and includes a scenario creation unit 10, a 3D modeling unit 11, a shading unit 15, and a depth image generation unit 16.

In this embodiment, the shading unit 15 is a module that generates an actual 3DCG using each 3D model data D101 generated by the 3D modeling unit 11, and expresses a shadow of an object realized by the 3DCG by shading processing. A three-dimensional and realistic image is generated according to the position of the light source and the intensity of light. In particular, the shading unit 15 in the present embodiment has a laser light irradiation part extraction unit 15a, and the laser light irradiation part extraction unit 15a extracts a 3D shape of only the part irradiated with the laser light to perform shading. Then, the shading image D106 is output. Further, since the reflected light of the laser beam is a light beam that does not have a color component such as RGB in the first place, the shading unit 15 outputs a shaded image D106 that is directly converted into a gray scale.

Further, the depth image generation unit 16 acquires 3D shape data of each object on the screen based on the modeling information D102 of each 3D shape model input from the 3D shading unit 12, and the distance to each object This is a module for generating a depth image (also called depth-map) 105 based on. In particular, the depth image generation unit 16 in the present embodiment outputs a depth image D108 in which only a portion related to laser light irradiation is extracted by the laser light irradiation partial extraction unit 16a.

(Operation of LiDAR sensor virtual image generation system)
Next, the operation of the LiDAR sensor virtual image generation system having the above configuration will be described.

Among the 3D modeled objects, in the case of near-infrared rays, an image as shown in FIG. 5 can be acquired simultaneously, whereas the laser beam of the LiDAR sensor has a strong directivity, so It has the property of being irradiated to only a part of the. The LiDAR is a sensor that measures the distance to an object at a long distance by measuring scattered light in response to laser irradiation issued in a pulse form. In particular, it is attracting attention as one of the essential sensors for improving the accuracy of automated driving. Hereinafter, basic features of LiDAR will be described below.

The laser light used for LiDAR is near-infrared light (for example, a wavelength of 905 nm) with a micro pulse. The scanner and optical meter are composed of a motor, a mirror, a lens, and the like. On the other hand, the light receiver and the signal processing unit receive the reflected light and calculate the distance by signal processing.

Here, as means adopted in LiDAR, there is a LiDAR scanning device 114 called a TOF method (Time of Flight), and this LiDAR scanning device 114 is based on a laser driver 114a as shown in FIG. Based on the control, the laser light is output from the light emitting element 114b through the irradiation lens 114c as an irradiation pulse Pl1. The irradiation pulse Pl1 is reflected by the measurement object Ob1, is incident on the light receiving lens 114d as a reflection pulse Pl2, and is detected by the light receiving element 114e. The detection result by the light receiving element 114e is output from the LiDAR scanning device 114 as an electric signal by the signal light receiving circuit 114f. In such a LiDAR scanning device 114, an ultrashort pulse having a rise time of several ns and an optical peak power of several tens of watts is irradiated toward the measurement object, and the ultrashort pulse is reflected by the measurement object and returned to the light receiving element. The time t until it comes is measured. If the distance to the object at this time is L and the speed of light is c,
L = (c × t) / 2
Is calculated by

As shown in FIGS. 8A to 8C, the basic operation of the LiDAR system is as follows. The laser beam emitted from the LiDAR scanning device 114 and reflected by the rotating mirror 114g is reflected from the laser beam. The laser beam which is swung left and right or rotated by 360 ° and scanned and reflected back is captured by the light receiving element 114e of the LiDAR scanning device 114 again. The captured reflected light is finally obtained as point cloud data PelY and PelX in which the signal intensity corresponding to the rotation angle is shown. Note that, as a rotary LiDAR system, for example, as shown in FIG. 9, the central portion is rotated and irradiated with laser light, and scanning of 360 degrees is possible.

As described above, since the laser beam of the LiDAR sensor has a strong directivity, it has a property that it can be irradiated only to a part of the screen even when it is irradiated to a far distance. Therefore, in the shading unit 15 shown in FIG. 6, the 3D shape of only the portion irradiated with the laser beam is extracted and shaded by the laser beam irradiation portion extraction unit 15a, and the shading image D106 is output. .

On the other hand, in the depth image generation unit 16 that has also received the 3D shape data D107 of the laser light irradiation portion, the laser light irradiation portion extraction unit 16a outputs a depth image D108 in which only the portion related to laser light irradiation is extracted. . FIG. 10 shows an example in which a laser negotiator part is extracted, and a laser beam is emitted in a 360-degree direction from LiDAR attached to the upper part of a running vehicle in the center of the image. In the example shown in the figure, the car is detected by the reflected light upon beam irradiation on the upper left side of the screen, and the pedestrian is detected by the reflected light upon beam irradiation on the upper right side of the screen. .

Therefore, for example, the shading unit 15 may generate an image as a result of shading the 3D shape of the automobile shown in FIG. 10 using the 3DCG technology. Note that, in the first embodiment (FIG. 1) described above, the RGB image is generated internally and then only the R component is output. However, the reflected light of the laser beam originally has a color component such as RGB. In this embodiment, the shaded image D106 that has been directly grayscaled is output because it is a light beam that does not have. Next, the depth image generation unit 16 generates the depth image D108 for only the reflection portion of the laser light, whereas the depth image described in the first embodiment is the entire screen.

The gray scaled shading image D106 and depth image D108 obtained by the above operation are transmitted as output images of the present embodiment. These two image outputs can be used for image recognition and learning of the recognition function.

[Third Embodiment]
Next, a virtual image deep learning recognition system according to a third embodiment of the present invention will be described. In the present embodiment, the virtual image system using the near-infrared sensor described in the first embodiment and the virtual image system using the LiDAR sensor described in the second embodiment are automatically operated such as a deep learning recognition system. It is applied to AI recognition technology that is widely used, and virtual environment images in an environment that cannot actually be photographed can be supplied to various sensors.

(Configuration of virtual image deep learning recognition system)
FIG. 11 is a configuration diagram of a deep learning recognition system using a back-propagation type neural network that is currently considered to have the highest performance. The deep learning recognition system according to the present embodiment is roughly configured by a neural network calculation unit 17 and a back propagation unit 18.

The neural network calculation unit 17 includes a neural network composed of multiple layers as shown in FIG. 12, and the grayscale image D104 and the depth image D105, which are the outputs shown in FIG. 1, are input to this neural network. The Then, non-linear calculation is performed based on coefficients (608, 610) set in advance in the neural network, and a final output 611 is obtained.

On the other hand, the backpropagation unit 18 receives a calculation value D110 that is a calculation result of the neural network calculation unit 17, and uses teacher data (for example, data such as an irradiation image or a depth image based on a real image) to be compared. Error can be calculated with In the system illustrated in FIG. 11, a grayscale image D111 is input as teacher data for the grayscale image D104, and a depth image D112 is input as teacher data for the depth image D105.

Here, the back-propagation unit 18 performs an operation by the back-propagation method. This back-propagation method calculates how much error there is between the output of the neural network and the teacher data, and reversely propagates the result to calculate again from the output in the input direction. In this embodiment, the neural network calculation unit 17 that has received the error value D109 fed back performs a predetermined calculation again and inputs the result to the backpropagation unit 18. The above operations in the loop are executed until the error value becomes smaller than a preset threshold value, and the neural network calculation is terminated when it is determined that the error has sufficiently converged.

When the above operation is completed, the coefficient values (608, 610) in the neural network in the neural network calculation unit 17 are determined, and deep learning recognition for an actual image can be performed using this neural network. it can.

In this embodiment, the deep learning recognition for the output image of the near-infrared image described in the first embodiment is exemplified. However, by using the same method, the deep learning for the output image of the LiDAR sensor of the second embodiment is performed. Recognition can be handled in exactly the same way. In this case, the input image at the left end in FIG. 11 is the shading image D106 and the depth image D108 in FIG.

[Fourth Embodiment]
Next, a fourth embodiment of the present invention will be described. Of the output images of the virtual image system using the LiDAR sensor described in the second embodiment, a depth image D108 is output from the depth image generation unit 16. How accurate this depth image is as a distance image that actually assumes laser light is very important as an evaluation point of this simulation system. In the present embodiment, an example in which the present invention is applied to an evaluation system for evaluating this depth image will be described.

(Configuration of depth image evaluation system)
As shown in FIG. 13, the depth image evaluation system according to the present embodiment is configured as an evaluation unit for the depth image D108 output from the depth image generation unit 16 described above, and includes a TOF calculation unit 19 and a distance. An image generation unit 20 and a comparative evaluation unit 21 are included.

The TOF calculation unit 19 is a module that calculates TOF information including the TOF value and the like for the depth image D108 generated by the depth image generation unit 16, and the projection pulse sent from the light source is reflected by the subject, and this This corresponds to a delay time that is a time difference when the reflected pulse is received by the sensor as a received light pulse. This delay time is output from the TOF calculation unit 19 as a TOF value D113.

The distance image generation unit 20 acquires the TOF of each point of the laser irradiation portion based on the TOF value D113 calculated by the TOF calculation unit 19, and based on the delay time at each point, the distance L to each point And a distance image D114 in which the distance L to each point is represented by an image.

The comparison evaluation unit 21 performs a comparison calculation between the distance image D114 generated by the distance image generation unit 20 and the depth image D108 input from the depth image generation unit 16, and includes a comparison including the degree of coincidence thereof. This module evaluates based on the results. As a comparison method, a generally used absolute value square error or the like can be used. The larger the value of this comparison result, the greater the difference between the two. Evaluating how close the depth image based on 3DCG modeling is to the distance image actually generated assuming the TOF of laser light can do.

(Operation of depth image evaluation system)
Next, the operation of the depth image evaluation system having the above-described configuration will be described.
After the depth image D108 generated by the depth image generation unit 16 is input to the TOF calculation unit 19, the time of TOF is calculated. This TOF is t described in FIG. More specifically, as shown in FIG. 14A, the TOF is reflected by a subject when the laser light is emitted from the light source in the form of a pulse as a light projection pulse. The pulse is received by the sensor as a received light pulse. The time difference at that time is measured. This time difference corresponds to a delay time between the light projection pulse and the light reception pulse, as shown in FIG.

From the above, the TOF value D113 calculated by the TOF calculation unit 19 in FIG. 6 is output. Once the TOF calculation unit 19 knows the TOF of each point of the laser irradiation portion, the distance L of each point can be obtained by performing reverse calculation using the following equation.
L = (1/2) × c × t
(C: speed of light, t: TOF)
The distance image generation unit 20 generates the distance image D114 of each point of the irradiation unit image by the above calculation. Thereafter, a comparison calculation is performed between the depth image D108 and the distance image D114. As a comparison means, a commonly used absolute value square error may be used. The larger the value, the greater the difference between the two. The depth image based on 3D CG modeling is actually the distance image generated assuming the TOF of laser light (this is correct). It is possible to evaluate whether the degree is close.

As described above, the comparison result D115 may be a numerical value such as an absolute value square error, or may be a signal that both are not approximated after the threshold processing. In the latter case, for example, the result may be fed back to the 3D modeling unit 11 in FIG. By repeatedly executing this processing operation to a predetermined approximate level, it is possible to generate a depth image based on highly accurate 3D-CG.

[Fifth Embodiment]
Next, a fifth embodiment of the present invention will be described. The first to fourth embodiments described above all relate to means for generating a near-infrared image or a virtual image of a LiDAR sensor, but in this embodiment, these virtual images are actually used. A description will be given of the control in the case of performing the driving in the automatic driving in real time. In this embodiment, the case where the simulator system of this invention is applied to the machine learning and test of the image recognition function module in the automatic driving | running | working driving system of a motor vehicle is illustrated.

Here, the autonomous driving system is a system such as ADAS (advanced driver system) that detects and avoids the possibility of accidents in advance, and in order to realize the automatic driving of the car, It recognizes the camera video mounted on the vehicle using image recognition technology, detects other vehicles, pedestrians, traffic lights, and other objects, and automatically performs control such as speed reduction and avoidance.

(Overall configuration of vehicle synchronization simulator system)
FIG. 15 is a conceptual diagram showing the overall configuration of the simulator system according to the present embodiment. The simulator system according to the present embodiment executes a simulation program for one or a plurality of simulation targets, and executes tests and machine learning of these simulator programs. In this simulator system, as shown in FIG. 15, a simulator server 2 is arranged on a communication network 3, and an information processing terminal 1 a that generates or acquires the position of the own vehicle through the communication network 3 with respect to the simulator server 2. The in-vehicle device 1b is connected.

Here, the communication network 3 is an IP network using the communication protocol TCP / IP, and includes various communication lines (telephone lines, ISDN lines, ADSL lines, optical lines such as optical lines, dedicated lines, WCDMA (registered trademark)). ) And 3rd generation (3G) communication systems such as CDMA2000, 4th generation (4G) communication systems such as LTE, and 5th generation (5G) and later communication systems, as well as WiFi (registered trademark) and Bluetooth. (A wireless communication network such as (registered trademark)) is a distributed communication network constructed by connecting each other. This IP network includes a LAN such as an intranet (in-company network) or a home network based on 10BASE-T or 100BASE-TX. In many cases, simulator software is installed in the PC 1a. In this case, simulation with a single PC can be performed.

The simulator server 2 is composed of one or a plurality of server device groups, and the function of each server device is realized by a server computer that executes various information processing or software having the function. The simulator server 2 includes an application server configured by a server computer that executes application software for the server, or middleware that assists in managing the execution of the application on such a computer.

Further, the simulator server 2 includes a web server that processes an HTTP response request from a client device, and this web server serves as a bridge to a database core layer that executes a back-end relational database management system (RDBMS). Responsible for processing such as data processing. The relational database server is a server on which a database management system (DBMS) is operating, and transmits requested data to client devices and application servers (AP servers), and accepts operation requests to rewrite or delete data. It has a function to do.

The information processing terminal 1 a and the in-vehicle device 1 b are client devices connected to the communication network 3, are provided with arithmetic processing devices such as a CPU, and provide various functions by executing a dedicated client program 5. The information processing terminal can be realized by, for example, a general-purpose computer such as a personal computer or a dedicated device specialized in functions, such as a smartphone, a mobile computer, a PDA (Personal Digital Assistance), a mobile phone, a wearable terminal device, etc. Is included.

The information processing terminal 1a or the in-vehicle device 1b can access the simulator server 2 through the dedicated client program 5 to transmit and receive data. The client program 5 is partly or wholly incorporated into a driving simulation system or an in-vehicle automated driving system, and images captured by a camera mounted on the vehicle, or captured landscape images (in this embodiment, CG video is included) using image recognition technology to detect other vehicles, pedestrians, traffic lights, and other objects in the video, and based on the recognition results, The positional relationship is calculated, and control such as speed reduction and avoidance is automatically performed according to the calculation result. Note that the client program 5 according to the present embodiment causes the simulator server 2 to perform the image recognition function, and according to the recognition result in the simulator server 2 by the automatic driving mechanism by the vehicle position information calculation unit 51 shown in FIG. The position information is calculated or acquired so as to displace the position information of the own vehicle by causing the automobile to run virtually on the map or actually running the vehicle.

(Configuration of each device)
Next, the configuration of each device will be specifically described. FIG. 16 is a block diagram illustrating an internal configuration of the client device according to the present embodiment, and FIG. 17 is a block diagram illustrating an internal configuration of the simulator server according to the present embodiment. The “module” used in the description refers to a functional unit that is configured by hardware such as an apparatus or a device, software having the function, or a combination thereof, and achieves a predetermined operation. .

(1) Configuration of Client Device The information processing terminal 1a can be realized by a general-purpose computer such as a personal computer or a dedicated device. On the other hand, the in-vehicle device 1b is mounted on a vehicle in addition to a general-purpose computer such as a personal computer. As shown in FIG. 16, specifically, a CPU 102, a memory 103, an input interface 104, and a storage device can be used. A device 101, an output interface 105, and a communication interface 106 are provided. In the present embodiment, these devices are connected via a CPU bus, and data can be exchanged between them.

The memory 103 and the storage device 101 are devices that store data in a recording medium and read out the stored data in response to a request from each device. For example, a hard disk drive (HDD), a solid state drive (SSD), A memory card or the like can be used. The input interface 104 is a module that receives an operation signal from an operation device such as a keyboard, a pointing device, a touch panel, or a button. The received operation signal is transmitted to the CPU 402 and can perform operations on the OS and each application. The output interface 105 is a module that transmits a video signal and an audio signal in order to output video and audio from an output device such as a display and a speaker.

In particular, when the client device is the in-vehicle device 1b, the input interface 104 is connected to a system such as the above-mentioned ADAS for an automatic driving system, and the camera 104a mounted on the vehicle in order to realize automatic driving driving of the automobile. In addition to an image sensor such as the above, various sensor means such as a LiDAR sensor, a millimeter wave sensor, and an infrared sensor are connected.

The communication interface 106 is a module that transmits and receives data to and from other communication devices. As a communication method, for example, a public line such as a telephone line, an ISDN line, an ADSL line, and an optical line, a dedicated line, WCDMA (registered trademark) In addition to 3rd generation (3G) communication systems such as CDMA2000, 4th generation (4G) communication systems such as LTE, and 5th generation (5G) and later communication systems, WiFi (registered trademark), Bluetooth ( Wireless communication networks such as registered trademark).

The CPU 102 is a device that performs various arithmetic processes necessary for controlling each unit, and virtually constructs various modules on the CPU 102 by executing various programs. On this CPU 102, an OS (Operating System) is activated and executed, and the basic functions of the information processing terminals 1a to 1c, 4 and 5 are managed and controlled by this OS. Various applications can be executed on the OS, and the OS program is executed by the CPU 102, whereby the basic functions of the information processing terminal are managed and controlled, and the application program is executed by the CPU 102. As a result, various functional modules are virtually constructed on the CPU.

In the present embodiment, the client side execution unit 102a is configured by executing the client program 5 on the CPU 102, and the position information of the host vehicle on the virtual map or the actual map is generated through the client side execution unit 102a. Alternatively, it is acquired and transmitted to the simulator server 2, and a recognition result of a landscape video (including a CG video in the present embodiment) on the simulator server 2 side is received, and the own vehicle and its object are received based on the received recognition result The positional relationship is calculated, and control such as speed reduction and avoidance is automatically performed according to the calculation result.

(2) Configuration of Simulator Server The simulator server 2 according to the present embodiment is a server device group that provides a vehicle synchronization simulator service through the communication network 3, and the function of each server device is a server computer that executes various information processing or Realized by software with this function. Specifically, as shown in FIG. 17, the simulator server 2 includes a communication interface 201, a UDP synchronization control unit 202, a simulation execution unit 205, a UDP information transmission / reception unit 206, and various databases 210 to 213. .

The communication interface 201 is a module that transmits / receives data to / from other devices through the communication network 3. As a communication method, for example, a public line such as a telephone line, an ISDN line, an ADSL line, an optical line, a dedicated line, WCDMA In addition to third-generation (3G) communication systems such as (registered trademark) and CDMA2000, fourth-generation (4G) communication systems such as LTE, and fifth-generation (5G) and subsequent communication systems, WiFi (registered trademark) ), Bluetooth (registered trademark) and the like.

In FIG. 18, the UDP synchronization control unit 202 is a module that synchronously controls the position information calculation processing for displacing the host vehicle on the client device 1 side, and the image generation processing and image recognition processing on the simulator server 2 side. The vehicle position information calculation unit 51 on the client device 1 side acquires the recognition result in the image recognition unit 204 through the UDP information transmission / reception unit 206, and generates a control signal for controlling the operation of the vehicle using the acquired recognition result. The position information of the host vehicle is changed / corrected based on the generated control signal.

The UDP information transmission / reception unit 206 is a module that cooperates with the client-side execution unit 102a on the client device 1 side to transmit / receive data between them. In this embodiment, the UDP information transmission / reception unit 206 is calculated or acquired on the client device 1 side. The position information is packetized into a specific format and sent to the simulator server 2 side. The packetized data is transmitted via a network or a transmission bus in a specific device, and the simulator server 2 side transmits the packet data. The data is received and de-packetized, and the de-packetized data is input to the image generation unit 203 to generate an image. In this embodiment, the UDP information transmitting / receiving unit 206 transmits / receives signals transmitted / received between the devices by the UDP synchronization control unit 202 using UDP (User Datagram Protocol).

The various databases include a map database 210, a vehicle database 211, and a drawing database 212. Note that these databases can be mutually referred to by a relational database management system (RDBMS).

The simulation execution unit 205 generates a simulation image that reproduces the area specified by the position information based on the position information generated or acquired by the position information acquisition unit on the client device 1 side and transmitted to the simulator server 2 side. In this module, a specific object is recognized and detected from the generated simulation image using a recognition function module. Specifically, an image generation unit 203 and an image recognition unit 204 are provided.

The image generation unit 203 acquires the position information acquired or calculated by the position information acquisition unit on the client device 1 side, and based on this position information, the area (latitude / longitude on the map, This is a module that generates a simulation image in which computer graphics are used to reproduce a landscape based on the direction and field of view. The simulation image generated by the image generation unit 203 is sent to the image recognition unit 204. As the image generation unit 203, the near-infrared virtual image generation system described in the first embodiment described above or the LiDAR sensor virtual image generation system described in the second embodiment can be adopted. Thus, various virtual images generated using computer graphics technology may be input to the image recognition unit 204.

The image recognition unit 204 is a module that recognizes and detects a specific object from the simulation image generated by the image generation unit 203 using the recognition function module 204a that is a test target or a machine learning target. The recognition result information D06 by the image recognition unit 204 is transmitted to the vehicle position information calculation unit 51 on the client device 1 side. The image recognition unit 204 is provided with a learning unit 204b, and executes machine learning of the recognition function module 204a.

The recognition function module 204a acquires an image captured by the camera device or a CG generated by the image generation unit 203, and extracts a plurality of feature points in the acquired image in a hierarchical manner. The module recognizes an object by a hierarchical combination pattern of points, and the learning unit 204b actually inputs a photographed image or virtual CG image by the camera device to the recognition function module 204a. Extracts feature points of images that are difficult to shoot or reproduce, thereby diversifying extraction patterns and improving learning efficiency.

As the recognition function module 204a of the image recognition unit here, the neural network calculation unit 17 of the virtual image deep learning recognition system described in the third embodiment can be applied, and as the learning unit 204b, The above-described back propagation unit 18 can be applied.

(Vehicle synchronization simulator system method)
By operating the vehicle synchronization simulator system having the above configuration, the vehicle synchronization simulation method of the present invention can be implemented. FIG. 4 is a block diagram showing a configuration and operation related to image generation / image recognition in the present embodiment. FIG. 19 is a flowchart showing the processing procedure of the synchronization simulator in the present embodiment.

First, the vehicle position information calculation unit 51 acquires the vehicle position information D02 of the own vehicle (S101). Specifically, various data groups D01 such as map information and vehicle initial data are input to the vehicle position information calculation unit 51 by executing the client program 5 on the client device 1 side. Next, using these various data groups D01, the position information of the own vehicle on the virtual map or the actual map is calculated (generated) or acquired, and the result is used as the vehicle position information D02, and the UDP synchronization control unit 202 and the UDP The information is transmitted to the simulation execution unit 205 on the simulator server 2 side through the information transmission / reception unit 206 (S102).

More specifically, the vehicle position information calculation unit 51 first sends the vehicle position information D02 of the own vehicle to the UDP synchronization control unit 202 according to the timing of the control signal D03 from the UDP synchronization control unit 202. As initial data of the vehicle position information calculation unit 51, for example, map data, position information of the own vehicle in the map, and information such as the rotation angle and diameter of the wheel of the vehicle body can be loaded from a predetermined storage device 101. The UDP synchronization control unit 202 and the UDP information transmission / reception unit 206 transmit and receive data between them in cooperation with the client side execution unit 102a on the client device 1 side. Specifically, the UDP synchronization control unit 202 and the UDP information transmission / reception unit 206 are vehicle position information D02 calculated or acquired on the client device 1 side, and various data groups including vehicle information are packetized into a specific format. The packet information D04 is sent to the simulator server 2 side.

The packetized data is transmitted via a network or a transmission bus in a specific device, and the simulator server 2 receives the packet data to be de-packetized (S103), and the de-packetized data D05 is input to the image generation unit 203 of the simulation execution unit 205 to generate a CG image. At this time, the UDP information transmission / reception unit 206 transmits / receives packet information D04 in which various data groups including vehicle information are packetized between the devices by the UDP synchronization control unit 202 using UDP (User Datagram Protocol).

More specifically, the UDP synchronization control unit 202 converts the seed data group into packet information D04 by UDP packetizing the vehicle position information D02 of the own vehicle. This facilitates transmission / reception using the UDP protocol. Here is a little about UDP (User Datagram Protocol). In general, TCP performs high reliability, connection type, window control, retransmission control, and congestion control, whereas UDP does not have a mechanism to ensure reliability with a connectionless protocol, but has a low latency due to simple processing. There is a big advantage. In this embodiment, since low delay is required when data is transmitted between the components, UDP is used instead of TCP. Furthermore, RTP (Realtime Transport Protocol) most widely used in current voice communication and video communication may be used.

Here, the specific content of the vehicle position information D02 of the own vehicle is, for example, as follows: • Position information of the own vehicle (three-dimensional coordinates (X, Y, Z) such as road surface absolute position coordinates)
・ Euler angle of own vehicle ・ Tire position information (three-dimensional coordinates (X, Y, Z) such as absolute position coordinates of tire road surface)
・ Information such as wheel rotation angle, steering wheel, and brake pedal.

In the UDP information transmission / reception unit 206 that has received the vehicle position information D02, the vehicle information, for example, various information such as XYZ coordinates that are vehicle position information, XYZ coordinates that are tire position information, and Euler angles are mainly used as vehicles. The data D05 necessary for generating the CG image is sent out.

The packet information D04 in which various data groups are UDP packets is divided into a packet header and a data body payload by the de-packetizing process in the UDP information transmission / reception unit 206. Here, the exchange of UDP packet data may be performed using a network between distant locations, or may be performed between transmission buses within a single device such as a simulator. Data D05 corresponding to the payload is input to the image generation unit 203 of the simulation execution unit 205 (S104).

In the simulation execution unit 205, the image generation unit 203 acquires the position information acquired or calculated by the position information acquisition unit on the client device 1 side as data D05, and is specified by the position information based on the position information. A simulation image in which the region (the landscape based on the latitude / longitude, direction, and field of view on the map) is reproduced by computer graphics is generated (S105). The simulation image D13 generated by the image generation unit 203 is sent to the image recognition unit 204.

In the image generation unit 203, as a predetermined image generation method, for example, a CG image generation technique using the latest physical rendering (PBR) method is used to generate a realistic image. The recognition result information D06 is input again to the vehicle position information calculation unit 51, and is used, for example, to calculate vehicle position information for determining the next operation of the host vehicle.

The image generation unit 203 can generate not only vehicles but also surrounding images, for example, objects such as road surfaces, buildings, traffic lights, other vehicles, and pedestrians, using, for example, the CG technique using the PBR method. This is a title by a game machine such as PlayStation, and since the objects as described above are generated very realistically, it can be understood that the latest CG technology can be sufficiently realized. In many cases, an image of an object other than the own vehicle is already stored as initial data. In particular, in an automatic driving simulator, a large amount of sample data on highways and ordinary roads is stored in a database, and these data may be used as appropriate.

Subsequently, the image recognition unit 204 recognizes and extracts a specific target object as an object from the simulation image generated by the image generation unit 203 using the recognition function module 204a that is a test target or a machine learning target. (S106). If there is no recognized object (“N” in step S107), the process proceeds to the next time frame (S109), and the above processing S101 to S107 is performed until there is no time frame (“N” in step S109). Is repeated ("Y" in step S109).

On the other hand, if a recognized object exists in step S107 (“Y” in step S107), the recognition result by the image recognition unit 204 is recognized as recognition result information D06, and the vehicle position information calculation unit 51 on the client device 1 side. Sent to. Then, the vehicle position information calculation unit 51 on the client device 1 side acquires the recognition result information D06 in the image recognition unit 204 through the UDP information transmission / reception unit 206, and controls to control the operation of the vehicle using the acquired recognition result. A signal is generated, and the position information of the host vehicle is changed / corrected based on the generated control signal (S108).

More specifically, the simulation image D13, which is a CG image generated here, is input to the image recognition unit 204 to perform object recognition and detection using a recognition technique such as deep learning as described above. . The obtained recognition result is given by area information on the screen (for example, XY two-dimensional coordinates of the extracted rectangular area) such as other vehicles, pedestrians, signs, and traffic lights.

When executing a simulator for automatic driving, there are many objects (objects) such as other vehicles, pedestrians, buildings, and road surfaces in an image during actual vehicle driving. For realization of automatic driving, for example, automatically turning the steering wheel or stepping on the accelerator while acquiring real-time information from various sensors such as camera images mounted on the vehicle, millimeter waves, radar waves, etc. Perform actions such as applying a brake.

Therefore, in the case of the near-infrared image of the first embodiment, objects necessary for automatic traveling, such as other vehicles, pedestrians, signs, traffic lights, etc. are described in the third embodiment from among the objects shown in the screen. Recognition and identification using image recognition technology such as deep learning.

For example, when another vehicle has interrupted the front surface of the host vehicle, the image recognition unit 204 detects approach by image recognition technology and outputs recognition result information D06 of the recognition result to the vehicle position information calculation unit 51. To do. Based on this information, the vehicle position information calculation unit 51 changes the position information of the host vehicle by, for example, performing an operation such as turning the steering wheel to avoid it or decelerating by a brake operation. Alternatively, when a pedestrian suddenly jumps out in front of his / her own vehicle, he / she performs operations such as turning off the steering wheel and avoiding sudden braking, and similarly changing the position information of the own vehicle as a result. .

In the series of configurations described above, data transmission from the vehicle position information calculation unit 51 to the simulation execution unit 205 via the UDP synchronization control unit 202 and the UDP information transmission / reception unit 206 is performed at a cycle of 25 msec, for example, according to the UDP protocol. It can be sent out (25 msec is an example).

Here, regarding the necessity of the “synchronization model” which is a feature of the present invention, the vehicle position information of the next time frame is determined based on the output result from the simulation execution unit 205, so the whole is synchronized. This is because the behavior of an actual vehicle cannot be simulated if it cannot be controlled. Although transmission is performed at a cycle of every 25 msec, it is ideally zero delay, but practically impossible. Therefore, the use of UDP reduces delay time associated with transmission / reception.

In general, in the case of a simulator for automatic driving, it is necessary to test a very large number of video frames. The object of the present embodiment is to substitute an absolute amount that cannot be completely covered by actual driving with a CG image that is substantially similar to a real image. Therefore, it is necessary to guarantee the operation for moving image sample data of a long sequence.

In the present embodiment, the learning unit 204b inputs, to the recognition function module 204a, a virtual CG image generated by the image generation unit 203 in addition to the image captured by the in-vehicle camera device during actual traveling. In practice, feature points of an image that is difficult to capture or reproduce are extracted to diversify the extraction patterns and improve learning efficiency. The recognition function module 204a acquires an image or CG image captured by the camera device, extracts a plurality of feature points in the acquired image in a hierarchical manner, and uses a hierarchical combination pattern of the extracted feature points. The recognition is performed by the deep learning recognition technique already described in the third embodiment.

[Sixth Embodiment]
Hereinafter, a sixth embodiment of the system according to the present invention will be described in detail with reference to the accompanying drawings. FIG. 20 is a conceptual diagram showing the overall configuration of the system according to the present embodiment. FIG. 21 is a block diagram showing an internal configuration of the apparatus according to the present embodiment. In the fifth embodiment, the embodiment is mainly limited to the case where the number of own vehicles is one. However, in this embodiment, a case where position information for a large number of vehicles is processed simultaneously in parallel is illustrated. is doing.

As shown in FIG. 20, in this embodiment, a plurality of client devices 1c to 1f are connected to the simulator server 2. As shown in FIG. 21, in the simulator server 2, the UDP synchronization control unit 202 and UDP information transmission / reception are performed. The unit 206 is a common component, and the vehicle position information calculation units 51c to 51f are provided in the client devices 1c to f according to the number of vehicles to be simulated, and the simulation execution unit 205c is provided on the simulator server 2 side. To f are provided.

The vehicle position information calculation units 51c to 51f send vehicle position information D02c to f of the own vehicle to the UDP synchronization control unit 202 according to the timing of the control signals D03c to f. Next, the UDP synchronization control unit 202 converts the vehicle position information D02c to f of the own vehicle into packet information D04 including various data groups by UDP packetizing. This facilitates transmission / reception using the UDP protocol. The packet information D04 is divided into a packet header and a data body payload by the de-packetizing process in the UDP information transmitting / receiving unit 206. Here, the exchange of UDP packet data may be performed using a network between distant locations, or may be performed between transmission buses within a single device such as a simulator. Data D05c to f corresponding to the payload are input to the simulation execution units 205c to 205f.

As described in the first embodiment, the simulation execution units 205c to 205f generate images with a rich reality by a predetermined image generation method, for example, a CG image generation technique using the latest physical rendering (PBR) method. . The recognition result information D06c to f is fed back to the vehicle position information calculation units 51c to 51f to change the position of each vehicle.

In the above example, the case where a total of four vehicle position information calculation units 51c to 51f are provided is illustrated, but this number is not particularly limited. However, if the number of vehicles to be handled increases, the synchronous control becomes complicated as a result, and if many delays occur in a certain vehicle, the delay time is the sum of each, so the total delay time This also leads to a problem that leads to an increase. Therefore, these configurations may be performed according to conditions such as the hardware scale and processing amount of the simulator.

In FIG. 20, the PC terminals 1c to 1f and the vehicle synchronization simulator program 4 are remotely connected via the communication network 3, but the program is mounted on a recording medium such as a local HDD or SSD of the PC. It can also be operated stand-alone. In this case, there is an advantage that the verification can be performed with a lower delay, and there is an advantage that the network bandwidth is not affected by the congestion caused when the network bandwidth is congested.

Further, it is not necessary to limit 1c to 1f as a PC terminal. For example, when a test is performed on an actual traveling vehicle, a car navigation system mounted on the test vehicle may be used. In this case, the simulation image D13, which is a CG image from the image generation unit 203 in FIG. 18, is not recognized by the image recognition unit 204, but a live-action running video is input instead of D13. It can be used for performance evaluation of the image recognition unit 204. For example, pedestrians and vehicles in live-action running images can be recognized instantly and accurately when viewed by humans, but are the results recognized and extracted by the image recognition unit 204 described above the same? This is because it can be verified.

[Seventh Embodiment]
Furthermore, a seventh embodiment of the system of the present invention will be described. In this embodiment, another example using a plurality of sensors will be described with reference to FIG. FIG. 22 shows an example in which sensors of different devices are mounted. In FIG. 22, one of the deep learning recognition units is, for example, an image sensor of a camera, and the other is, for example, a near infrared sensor, or LiDAR. Assume (Light Detection and Ranging).

As shown in FIG. 22, the first deep learning recognition unit 61 uses, for example, an image sensor unit, and the 3D graphics composite image is a two-dimensional plane image. Therefore, the deep learning recognition means includes a recognition method for a two-dimensional image. On the other hand, the next deep learning recognition unit 62 is 3D point cloud data input using a LiDAR sensor. The 3D point cloud data is converted into a 3D graphic image by the image generation unit 203.

As for the 3D point cloud data converted into the 3D graphic image, as shown in FIG. 10, the reflected light was measured by emitting laser light in all directions from LiDAR installed on the central traveling vehicle. The point cloud data obtained as a result are shown. The strength of the color indicates the strength of the reflected light. Therefore, a portion where there is no space such as a gap is black because there is no reflected light.

Since target objects such as the opponent vehicle, pedestrian, and bicycle can be acquired as data having three-dimensional coordinates from the actual point cloud data, it is possible to easily generate 3D graphic images of these target objects. Specifically, a plurality of polygon data are generated by matching the point cloud data, and the 3D graphic is rendered by rendering these polygon data.

The 3D point cloud data graphic image D61 generated by the above means is input to the deep learning recognition unit 62, where recognition is performed by the recognition means learned for 3D point cloud data. Accordingly, a means different from the deep learning recognition means learned from the image for the image sensor is used, but the effect is great. This is because it is highly possible that an oncoming vehicle that is very far away cannot be acquired by an image sensor, but in the case of LiDAR, the size and shape of an oncoming vehicle several hundred meters away can be acquired. Conversely, LiDAR uses reflected light, so there is a disadvantage that it is not effective for an object that does not reflect, but this problem does not occur in the case of an image sensor.

As described above, a plurality of sensors having different properties or different devices are installed, and the recognition result is analyzed by the learning result synchronization unit 84, and the final recognition result D62 is output. Note that this synchronization unit may be performed outside a network such as a cloud. The reason is that not only will the number of sensors per unit increase rapidly in the future, but the computational load of deep learning recognition processing will also be large, so parts that can be handled externally will be executed in a cloud with large-scale computing power This is because a means for feeding back the result is effective.

Note that the example of FIG. 22 is a case of generating a virtual CG image, but as already described in the first embodiment, the present application system is mounted on an actual vehicle (car navigation system). In this way, it is possible to perform deep learning recognition by inputting information from different types of sensors while performing live action. FIG. 23 is a specific block diagram of the case.

The material photographing device assumes a LiDAR sensor and a millimeter wave sensor as described above, in addition to the image sensor provided in the in-vehicle camera. In the case of an image sensor, a high-quality CG image is generated by the PBR technique described in the first embodiment using parameters such as light information extracted from a captured image after capturing the captured image. Sent out. On the other hand, in the case of the LiDAR sensor, three-dimensional point cloud data is created from the reflected light of the laser beam actually emitted from the in-vehicle LiDAR sensor. An image obtained by converting the three-dimensional point cloud data into 3DCG is output from the image generation unit 203.

In this way, CG images corresponding to a plurality of types of sensors are sent from the image generation unit 203, and recognition processing is performed by predetermined means in each deep learning recognition unit in FIG. In the above embodiment, the LiDAR sensor has been described as an example, but it is also effective to use the near infrared sensor described in the second embodiment.

D01 ... Species data group D02 (D02c to f) ... Vehicle position information D03 (D03c to f) ... Control signal D04 ... Packet information D05 (D05c to f) ... Data D06 (D06c to f) ... Recognition result information D100 ... Scenario information D101 ... Model data D102 ... Modeling information D103, D106 ... Shading image D104 ... Gray scale image D105, D108, D112 ... Depth image D107 ... D shape data D109 ... Error value D110 ... Calculation value D111 ... Gray scale image D113 ... TOF value D114 ... Distance image D115 ... Comparison result D13 ... Simulation image D61 ... 3D point cloud data graphic image D62 ... Recognition result 1 ... Client device 1a ... Information processing terminal 1b ... In-vehicle device 1c-1f ... Client device 2 ... Emulator server 3 ... Communication network 4 ... Vehicle synchronization simulator program 5 ... Client program 10 ... Scenario creation unit 114 ... LiDAR scanning device 114a ... Laser driver 114b ... Light emitting element 114c ... Irradiating lens 114d ... Light receiving lens 114e ... Light receiving element 114f ... Signal light receiving circuit 114g ... Mirror 11 ... 3D modeling unit 12 ... 3D shading unit 13 ... R image gray scale conversion unit 14 ... Depth image generation unit 15 ... Shading unit 15a ... Laser light irradiation partial extraction unit 16 ... Depth image generation unit 16a ... Laser light irradiation part extraction unit 17 ... Neural network calculation unit 18 ... Back propagation unit 19 ... TOF calculation unit 20 ... Distance image generation unit 21 ... Comparison evaluation unit 51 (51c to f) ... Vehicle position information Calculator 61 ~ 6n ... deep learning recognition unit 84 ... learning result synchronization section 101 ... storage device 102 ... CPU
DESCRIPTION OF SYMBOLS 102a ... Client side execution part 103 ... Memory 104 ... Input interface 105 ...

Output interface

106, 201 ... Communication interface 202 ... UDP synchronous control part 203 ... Image generation part 204 ... Image recognition part 204a ... Recognition function module 204b ... Learning part 205 ... Simulation execution unit 205c to f ... Simulation execution unit 206 ... UDP information transmission / reception unit 210 ... Map database 210-213 ... Species database 211 ... Vehicle database 212 ... Drawing database 402 ... CPU
611 ... Output

Claims

A system for generating, as computer graphics, a virtual image input to a sensor means,
A scenario creation unit for creating a scenario relating to the arrangement and behavior of an object present in the virtual image;
A 3D modeling unit that performs modeling for each object based on the scenario;
A 3D shading unit that performs shading for each model generated by the modeling unit and generates a shading image for each model;
A component extraction unit that extracts and outputs a predetermined component included in the shading image as a component image;
A depth image generation unit that generates a depth image in which a depth is defined based on information of a three-dimensional shape related to each object in the component image;
An image generation system comprising:
2. The image generation system according to claim 1, wherein the component is an R component of an RGB image.
The image generation system according to claim 1, further comprising a gray scale conversion unit configured to gray scale the component.
A system for generating, as computer graphics, a virtual image input to a sensor means,
A scenario creation unit for creating a scenario relating to the arrangement and behavior of an object present in the virtual image;
A 3D modeling unit that performs modeling for each object based on the scenario;
A 3D shading unit that performs shading for each model generated by the modeling unit and generates a shading image for each model;
A depth image generation unit that generates a depth image in which a depth is defined based on information on a three-dimensional shape of each object;
With
The shading part is
A function of performing shading only on a predetermined portion of the model from which light rays emitted from the sensor means are reflected;
A function of outputting only the three-dimensional shape of the predetermined part,
The depth image generation unit generates a depth image for each object based on information on a three-dimensional shape of the predetermined part.
The image generation system according to claim 1 or 4, wherein the sensor means is a near infrared sensor.
The image generation system according to claim 1 or 4, wherein the sensor means is a LiDAR sensor that detects reflected light of irradiated laser light.
The scenario creating means includes means for determining three-dimensional shape information of the object, operation information of the object, material information of the object, parameter information of the light source, camera position information, and sensor position information. The image generation system according to claim 1, wherein the system is an image generation system.
One component image and depth image based on a real image are acquired as teacher data, and neural network training is performed by back propagation based on the component image and the depth image generated by the depth image generation unit and the teacher data. The image generation system according to claim 1, further comprising deep learning recognition learning means.
By obtaining an irradiation image and a depth image based on a live-action as teacher data, and by an image of a shading result by the shading unit, the depth image generated by the depth image generation unit, and back propagation based on the teacher data, The image generation system according to claim 4, further comprising deep learning recognition learning means for performing neural network training.
A TOF calculation unit that calculates, as TOF information, a time required from receiving a light beam to receiving the reflected wave from the depth image generated by the depth image generation unit;
A distance image generation unit that generates a distance image based on the TOF information by the TOF calculation unit;
5. The image generation system according to claim 1, further comprising a comparison evaluation unit that compares the degree of coincidence between the distance image generated by the distance image generation unit and the depth image generated by the depth image generation unit. .
The modeling unit has a function of acquiring a comparison result by the comparative evaluation unit as feedback information, adjusting a condition in the modeling based on the acquired feedback information, and performing modeling again. The image generation system according to 10.
12. The image according to claim 11, wherein the feedback information based on the modeling and the comparison is repeatedly acquired, and is executed until a matching error in a comparison result by the comparison evaluation unit becomes smaller than a predetermined threshold. Generation system.
A simulation system of a recognition function module for an image that changes with displacement of vehicle position information,
Position information acquisition means for acquiring position information of the vehicle relative to surrounding objects based on a detection result by the sensor means;
Based on the position information acquired by the position information acquisition means, an image generation means for generating an image for simulation that reproduces the area specified by the position information;
Image recognition means for recognizing and detecting a specific object from the simulation image generated by the image generation means using the recognition function module;
Position information calculation means for generating a control signal for controlling the operation of the vehicle using the recognition result in the image recognition means, and for changing / correcting the position information of the own vehicle based on the generated control signal;
A simulation system comprising position information acquisition means, the image generation means, the image recognition means, and synchronization control means for synchronously controlling the position information calculation means.
The synchronization control means includes
Means for packetizing and sending the position information in a specific format;
Means for transmitting packetized data via a network or a transmission bus in a specific device;
Means for receiving and de-packetizing the packet data;
14. The simulation system according to claim 13, further comprising means for inputting the de-packetized data and generating an image.
14. The simulation system according to claim 13, wherein the synchronization control means transmits / receives a signal transmitted / received between the means by using UDP (User Datagram Protocol).
The vehicle position information includes any of XYZ coordinates of the road surface absolute position coordinates of the vehicle, road surface absolute position coordinates XYZ coordinates of the tire, vehicle Euler angles, and wheel rotation angles. The simulation system according to claim 13.
14. The simulation system according to claim 13, wherein the image generation means includes means for synthesizing a three-dimensional shape of the vehicle by computer graphics.
The vehicle is set for a plurality of vehicles, and the recognition function module is operated for each vehicle,
The position information calculation means uses the information of the recognition result by the recognition means to change / correct the position information of each vehicle for a plurality of vehicles,
The synchronization control means executes synchronization control for the plurality of vehicles with respect to position information acquisition means, the image generation means, the image recognition means, and the position information calculation means. 13. The simulation system according to 13.
14. The simulation system according to claim 13, wherein the image generation means includes means for generating a different image for each sensor means.
14. The simulation system according to claim 13, wherein the sensor means includes any one or all of an image sensor means, a LiDAR sensor, a millimeter wave sensor, and an infrared sensor.
The simulation system includes means for generating images corresponding to a plurality of sensors, and also includes a recognition means corresponding to each generated image, and the synchronization control is performed using the plurality of recognition results. The simulation system according to claim 13, further comprising means for performing.
The image generation system according to claim 1 or 4 is provided as the image generation means,
The simulation system according to claim 13, wherein a depth image generated by the depth image generation unit of the image generation system is input to the image recognition unit as the simulation image.
A program for generating, as computer graphics, a virtual image input to a sensor means,
A scenario creation unit for creating a scenario relating to the arrangement and behavior of an object present in the virtual image;
A 3D modeling unit that performs modeling for each object based on the scenario;
A 3D shading unit that performs shading for each model generated by the modeling unit and generates a shading image for each model;
A component extraction unit that extracts and outputs a predetermined component included in the shading image as a component image;
An image generation program that functions as a depth image generation unit that generates a depth image in which a depth is defined based on information of a three-dimensional shape related to each object in the component image.
A program for generating, as computer graphics, a virtual image input to a sensor means,
A scenario creation unit for creating a scenario relating to the arrangement and behavior of an object present in the virtual image;
A 3D modeling unit that performs modeling for each object based on the scenario;
A 3D shading unit that performs shading for each model generated by the modeling unit and generates a shading image for each model;
A depth image generation unit that generates a depth image in which a depth is defined based on information on a three-dimensional shape of each object;
To function,
The shading part is
A function of performing shading only on a predetermined portion of the model from which light rays emitted from the sensor means are reflected;
A function of outputting only the three-dimensional shape of the predetermined part,
The depth image generation unit generates a depth image for each target based on information on a three-dimensional shape of the predetermined part.
A simulation program of a recognition function module for an image that changes in accordance with displacement of vehicle position information, the computer,
Position information acquisition means for acquiring position information of the vehicle;
Based on the position information acquired by the position information acquisition means, an image generation means for generating an image for simulation that reproduces the area specified by the position information;
Image recognition means for recognizing and detecting a specific object from the simulation image generated by the image generation means using the recognition function module;
Position information calculation means for generating a control signal for controlling the operation of the vehicle using the recognition result in the image recognition means, and for changing / correcting the position information of the own vehicle based on the generated control signal;
A simulation program for causing a position information acquisition unit, the image generation unit, the image recognition unit, and the position information calculation unit to function as a synchronization control unit.
An image generation program according to claim 23 or 24 is provided as the image generation means,
The simulation program according to claim 25, wherein a depth image generated by the depth image generation unit of the image generation system is input to the image recognition unit as the simulation image.
A method for generating a virtual image input to a sensor means as computer graphics,
A scenario creation step in which a scenario creation unit creates a scenario related to the arrangement and behavior of an object present in the virtual image;
A 3D modeling step in which a 3D modeling unit performs modeling for each object based on the scenario;
A 3D shading step in which shading is performed for each model generated in the 3D modeling step, and a 3D shading unit generates a shading image for each model;
A component extraction step of extracting a predetermined component included in the shading image as a component image and outputting the component image;
An image generation method comprising: a depth image generation step in which a depth image generation unit generates a depth image in which a depth is defined based on information on a three-dimensional shape relating to each object in the component image.
A method for generating a virtual image input to a sensor means as computer graphics,
A scenario creation step in which a scenario creation unit creates a scenario related to the arrangement and behavior of an object present in the virtual image;
A 3D modeling step in which a 3D modeling unit performs modeling for each object based on the scenario;
A 3D shading step in which shading is performed for each model generated in the 3D modeling step, and a 3D shading unit generates a shading image for each model;
A depth image generation step in which a depth image generation unit generates a depth image in which a depth is defined based on information on a three-dimensional shape related to each object in the shading image,
The shading part is
A function of performing shading only on a predetermined portion of the model from which light rays emitted from the sensor means are reflected;
A function of outputting only the three-dimensional shape of the predetermined part,
The depth image generation unit generates a depth image for each target based on information on a three-dimensional shape of the predetermined part.
A simulation method of a recognition function module for an image that changes with displacement of vehicle position information,
A positional information acquisition step of acquiring positional information of the vehicle by positional information acquisition means;
Based on the position information acquired by the position information acquisition step, an image generation step in which an image generation unit generates an image for simulation that reproduces the region specified by the position information;
An image recognition step in which an image recognition means recognizes and detects a specific object from among the simulation images generated by the image generation step, and
Using the recognition result in the image recognition step, the position information calculation means generates a control signal for controlling the operation of the vehicle, and changes / corrects the position information of the own vehicle based on the generated control signal. An information calculation step;
A simulation method comprising: a synchronization control step in which a synchronization control means controls synchronization of position information acquisition means, the image generation means, the image recognition means, and the position information calculation means.
The image generation method according to claim 27 or 28 is included as the image generation step,
30. The simulation method according to claim 29, wherein a depth image generated by the depth image generation unit in the image generation method is input to the image recognition unit as the simulation image.