WO2020179605A1

WO2020179605A1 - Learning apparatus and learning method

Info

Publication number: WO2020179605A1
Application number: PCT/JP2020/007982
Authority: WO
Inventors: 敦丈小菅; 幸徳赤峰; 俊大島; 敬亮山本
Original assignee: 株式会社日立製作所
Priority date: 2019-03-01
Filing date: 2020-02-27
Publication date: 2020-09-10
Also published as: JP2020140644A

Abstract

The present invention automatically performs relearning and additional learning on changes in environment and usage situation. This learning apparatus comprises: a plurality of learning devices 102 to which signals from a plurality of sensors 101 are inputted, respectively, and the states of which are determined by teacher data; a synthesizer 107 which synthesizes information outputted from the plurality of learning devices and outputs output information including an identification result; and a feedback system 110 which gives the output information outputted from the synthesizer to the plurality of learning devices. The plurality of leaning devices 102 perform learning using, as the teacher data, the output information obtained via the feedback system.

Description

Learning device and learning method

The present invention relates to a learning device and a learning method, and is particularly a technique related to self-learning using a plurality of sensors.

In a picking robot in a distribution warehouse or an automatically driven vehicle traveling on a public road, the recognition calculation unit recognizes peripheral information using sensor data obtained from various sensors, and device control is performed based on the sensor data. As a sensor that recognizes surrounding objects and the like, an RGB camera, an infrared camera, LiDAR, a millimeter wave radar, and the like are known. In the recognition calculation, a machine learning algorithm that extracts and identifies features from sensor data is used. Further, since a single sensor has a limited range, an object, and an environment that can be recognized, a sensor fusion technology has been developed in which a plurality of sensors are combined to operate complementarily.

To accurately identify objects using machine learning, learners trained with a large amount of teacher data in advance are used. Since the learning parameters are optimized for the environment and objects included in the learning data set in the learning process, the machine learning algorithm cannot correctly recognize aging deterioration, fluctuations in the surrounding environment, and unlearned environment. In this case, it is necessary to prepare the learning data again and perform learning.

Regarding self-learning in machine learning and fusion technology of artificial intelligence, for example, there is Patent Document 1. In Patent Document 1, the result is output by another processing function of artificial intelligence 1 and artificial intelligence 2 for one input information, and the processing content to be executed next is determined by comparing the results. Since the evaluation results can be obtained from multiple viewpoints, it is possible to improve the reliability of the processing operation.

Further, Patent Document 2 discloses high efficiency of a plurality of learners in a computer network. This technology first sends performance score information from each learner to the control unit, and the control unit calculates the degree of freedom and reliability of each learner based on the received performance score information, and the degree of freedom of each learner. To calculate. It is said that high efficiency will be realized by operating each learning device based on this degree of freedom.

Japanese Patent Laid-Open No. 2018-151950 US Patent Publication US9836696

According to the technology of Patent Document 1, artificial intelligence does not have a function of receiving an external signal and autonomously and adaptively adjusting the recognition and judgment function. Therefore, it is considered that the environmental conditions for which the reliability can be improved are limited. Moreover, it does not autonomously relearn without receiving the comparison result as a feedback signal. In addition, since the processing is not performed to quantify the reliability of the processing operation and the accuracy of the output result of the artificial intelligence, the judgment and accuracy when different results are output between the artificial intelligence 1 and the artificial intelligence 2 are issues. Become.

In the technique of Patent Document 2, a feedback loop is formed by the performance score information from each learning device to the control unit and the information of the degree of freedom from the control unit to each learning device (learning system). This is mainly intended to improve efficiency by optimizing the entire learning system, and not to re-learn each learning device. However, if each learning device can improve the accuracy individually, further improvement in accuracy can be expected.

As described above, in the conventional techniques described in Patent Documents 1 and 2, the accuracy of the learning device deteriorates when the environment or the usage situation changes. In addition, when a learning device for supervised learning is assumed, if an unexpected change in the environment, etc. occurs with respect to the teacher data, the correct answer rate of the learning device deteriorates and the false alarm probability increases. To do.

An object of the present invention is to provide a learning device and a learning method capable of automatically performing re-learning and additional learning with respect to changes in the environment and the like.

According to a preferred example of the learning device according to the present invention, there are a plurality of learning devices that input signals from a plurality of sensors and determine a state based on teacher data.
A synthesizer that synthesizes information output from the plurality of learners and outputs output information including identification results, and a synthesizer.
It has a feedback system that gives the output information output from the synthesizer to the plurality of learners.
The plurality of learners are configured as a learning device that learns by using the output information obtained via the feedback system as teacher data.
The present invention is also grasped as a learning method executed by the learning device.

According to the present invention, it is possible to improve the accuracy even with a learning device having insufficient teacher data, and it is possible to automatically perform re-learning and additional learning in response to changes in the environment and the like.

It is a block diagram of the self-learning system according to Example 1. It is a figure which shows the correct answer rate of the learning device when the environment changes. It is a block diagram of the self-learning system according to Example 2. It is a block diagram of the self-learning system according to Example 3.

Hereinafter, preferred embodiments will be described with reference to the drawings.

FIG. 1 shows an example of a self-learning system to which a self-learning device is applied.
The self-learning system mainly includes a plurality of sensors 101, a plurality of learners 102 arranged at the outputs of the sensors 101, and a combiner 107 that combines the results of the plurality of learners 102. This self-learning system is realized by a computer executing a program. Here, a configuration including a plurality of learners 102 and a synthesizer 107 may be referred to as a self-learning device. Alternatively, the sensor 101 may also be included and called a self-learning device. Since the synthesizer 107 learns how to synthesize by itself, it may be called a learning synthesizer or a synthetic learner.

The sensor 101 is executed by, for example, a sensor element that detects physical information, chemical information, and the like, a controller that monitors the state of the sensor element and processes data detected by the sensor element, and a controller. The memory is configured to store programs and detection data, and a communication unit that transmits and receives control information and detection data to and from the learning device.

When a vehicle-mounted object is detected, for example, a combination of an RGB camera and a millimeter wave radar can be considered as the plurality of sensors 101. The output of the RGB camera is color image data, and the output of the millimeter wave radar is the speed associated with the point group and the points in the three-dimensional spatial coordinates. When the same target is detected by an RGB camera and a millimeter-wave radar to classify what the target is, if the different characteristics of the two sensors can be combined well, high accuracy of the classification can be expected. As a method of synthesizing the two sensor outputs, it is conceivable to combine the three-dimensional spatial coordination of the RGB image using the parallax and the point cloud data of the millimeter wave radar and classify them by the learner. However, it is not easy to perform time and space synchronization processing between different kinds of sensors, and the signal processing amount of coordinate conversion is enormous, which is not realistic. Therefore, it is preferable to prepare the learning device 102 corresponding to the sensor 101, perform the type for each sensor, and then make an integrated judgment with the synthesizer 107 based on the result of each type. In this case, the identification result 103 for each type is output from each learner 102. The identification result may be called an estimation result.

The synthesizer 107 synthesizes and estimates using, for example, a majority vote. It is expected that the inference accuracy will be improved by using the law of majority voting. On the other hand, depending on the sensor, a situation may occur in which an erroneous estimation result is biased under a specific environment. For example, the accuracy of the RGB camera deteriorates in an environment such as a dark place or backlight, and thus the accuracy of the estimation result classified by the learning device 102 deteriorates. The accuracy of millimeter-wave radar deteriorates because it is attenuated by rainfall. It is effective to weight the identification result 103 of each learner according to the environment in order to avoid the accuracy deterioration unique to the sensor under the specific condition and to realize the further improvement of the inference accuracy by combining. For example, when the sensor 101 (S1) is an RGB camera, the brightness is applied to the multiplier 106 as environmental information 105 (this portion is referred to as an application portion), and is reflected in the weighting coefficient to the accuracy information 104. When the sensor 101 (S2) is a millimeter wave radar, the rainfall information is applied to the multiplier 106 as the environment information 105 and reflected in the weighting coefficient for the accuracy information 104. The synthesizer 107 synthesizes the accuracy information 104 in which the weighting coefficient of the environment information 105 is reflected, so that the accuracy can be further improved. Here, these environmental information 105 are acquired by a plurality of external sensors 111. The environment information 105 is not always acquired from the external sensor 111, and for example, the rainfall information may use information distributed from an external organization such as the Meteorological Agency or an external device.

It is important to arrange the optimum learner according to the output characteristics of each sensor 101 from the viewpoint of improving the accuracy of the identification result and reducing the amount of signal processing. As the learning device 102 for the purpose of classification, a machine learning algorithm called CNN (Convolutional Neural Network) is often used in the case of color image data, and an SVM (Support Vector Machine) in the case of point cloud data. Machine learning algorithms, which are relatively light in processing, are often used. With these learning algorithms, it is possible to extract an approximate accuracy for the identification result 103 classified based on the correlation amount in CNN and the distance from the threshold value in SVM. By outputting this accuracy information 104 from the learner 102 and performing weighted synthesis in the synthesizer 107, it is possible to achieve higher accuracy than the identification result of the learner 102.

Further, if the weighted coefficient based on the environmental information 105 is included in the accuracy information 104 and the weighted synthesis is performed by the synthesizer 107, it is possible to obtain a highly accurate identification result 108 that is resistant to environmental changes. Further, the algorithm of the synthesizer 107 is not a simple majority vote, but a machine learning algorithm such as CNN or SVM can be used. As a result, the optimum nonlinear function for each task can be obtained from learning, and high accuracy can be expected. In this case, the error from the training data is fed back to the synthesizer 107 by the feedback system 110, and the weighting coefficient 112 is updated by learning. Here, the error means the difference between the identification result 108 and the learning data given from the outside. The error is, for example, the difference between the estimation result of the position coordinates of a person and the correct coordinate data described in advance in the learning data. The calculation of the error is performed, for example, by the synthesizer 107 itself or by using an external function at a part connected ahead of the identification result 108 of the synthesizer 107.

As a fundamental measure to further strengthen the environment change, additional learning and re-learning are performed on the learning device 102.

FIG. 2 is a diagram schematically showing the correct answer rate of the learning device 102 when the surrounding environment or state of the sensor changes with time. In FIG. 2, the x-axis shows the passage of time 201, and the y-axis shows the correct answer rate 202. As an example, in the case of an RGB camera, the teacher data is ideal as color image data from an RGB camera, and if dirt adheres or becomes cloudy to the camera lens over time, color spectrum variation or distortion will occur, resulting in learning. The correct answer rate of the vessel deteriorates. Here, when the additional learning or the re-learning is not performed, the correct answer rate continues to deteriorate with the passage of time as indicated by a dotted line 203. On the other hand, if additional learning or re-learning can be performed with data under a changing environment, that is, data with a lens that has deteriorated over time, the correct answer rate will improve as shown by the solid line 204. If it is desired to keep the correct answer rate high at all times, additional learning or re-learning will be performed each time the correct answer rate deteriorates, which requires a huge amount of teacher data.

The identification result 108, which is the output of the synthesizer 107, is more accurate than the output result 103 of the single learner 102 based on the law of large numbers. With the accuracy information 109 as teacher data, the feedback system 110 feeds back to each learner 102. By re-learning the learning device 102 corresponding to the sensor 101 based on the identification result 108 fed back to the learning device 102, the learning device 102 can always perform self-learning against environmental changes including deterioration of the sensor over time. it can. By synthesizing this with the synthesizer 107, as shown by the dotted line 205 in FIG. 2, it is possible to maintain higher accuracy and correct answer rate as the whole learning device.

Each learner 102 receives the output identification result 108 and updates its own weighting coefficient. When the CNN algorithm or SVM algorithm is adopted for each learner, the weighting coefficient uses the error back propagation method, and the identification result of each learner 102 is updated so that the identification result of each learner 102 is most associated with the output 110 of the synthesizer 107. It

If the accuracy or correct answer rate at the output of the synthesizer 107 is poor, there is a risk that incorrect teacher data will be fed back frequently, and in this case, a vicious cycle will occur as a feedback loop. As a countermeasure, a vicious cycle can be avoided by calculating the accuracy information 109 for the identification result 108 of the synthesizer 107 and controlling the data with low accuracy so as not to be used as the teacher data.

FIG. 3 shows a self-learning system according to the second embodiment.
Similar to the first embodiment, the self-learning system according to the second embodiment is a synthesizer 107 that synthesizes the results of the plurality of sensors 302, the plurality of learners 102 arranged at the outputs of the respective sensors, and the plurality of learners 102. Is configured. The difference from the first embodiment is that the output of the synthesizer 107 is connected to each sensor 302 via the feedback system 301. The feedback signal from the synthesizer 107 is given to the controller of the sensor 302.

An example is a combination of an RGB camera and an IR camera (infrared camera) as a plurality of sensors 302. Here, in a scene where light and dark are mixed, such as a spotlight at night, which is very bright in part and dark in others, the dynamic range of brightness is insufficient with the RGB camera alone, and the bright part is painted white. The deterioration phenomenon of the image referred to as “” or the deterioration phenomenon of the image referred to as “underexposure” in which a dark portion is painted black occurs. As a solution, for example, it is preferable to adjust the exposure of the camera so that a bright part can be seen with an RGB camera, and use an IR camera having an advantage as a sensor for a dark part.

For example, when it is desired to detect a person working in the vicinity of a spotlight such as a construction site at night, whether or not the person is from the learner 102 arranged at the output of the RGB camera and the learner 102 arranged at the output of the IR camera. The identification result 103 is output. By including the brightness information of the detected peripheral pixels in the accuracy information 104 at this time, the weighting coefficient is reduced for the peripheral pixels exceeding the dynamic range when synthesizing with the synthesizer 107, and the brightness is dark with the RGB camera. For peripheral pixels, it is also possible to increase the weight coefficient of the IR camera. Further, if the exposure of the RGB camera and the gain of the IR camera are fed back (301) based on the accuracy information 104 including the brightness information input to the synthesizer 107, the controller of the sensor 302 is always optimal as a system. It can be controlled to maintain the state of the sensor, and it is possible to stably detect a person even under the spotlight at night.

FIG. 4 shows a self-learning system according to the third embodiment.
In recent years, an RGB / IR integrated image sensor has been developed in which an image sensor for RGB and an image sensor for infrared rays (Infra Red, IR) are integrated on the same silicon chip.

In this case, as shown in FIG. 4, the self-learning system can be configured by using the RGB / IR integrated camera 401 equipped with the RGB / IR integrated image sensor. By using the integrated camera as compared with the second embodiment, it is possible to realize the same focal length, the same angle of view, and the same shutter timing for the RGB image and the IR image. This eliminates the need for complicated signal processing for performing viewpoint conversion or time conversion due to the difference in shutter timing between the RGB image and the IR image. Using the exposure of the RGB camera and the gain of the IR camera included in the feedback control signal 301, the camera controller 402 can control the gain 403, shutter speed 405, and aperture 406 of each RGB image and IR image.

Some modifications based on the above Examples 1 to 3 will be described.
In the second embodiment, assuming that the identification result 108 and the accuracy information 109, which are the outputs of the synthesizer 107, are fed back (110) to each of the plurality of learners 102 as training data, the exposure of the camera from the synthesizer 107 is further performed. The gain signal is fed back to the plurality of sensors 302 as a sensor control signal. According to the modification, it is possible to stop the feedback (110) of the training data and feed back only the sensor control signal to each sensor 302. As a result, the effect of the feedback of the training data in Example 1 cannot be obtained, but the effect of Example 2 can be realized.

As yet another modification, it is possible to stop the feedback (110) of the training data in the first embodiment and only reflect the weighting coefficient in the accuracy information 104 of each learner 102 by applying the environmental information 105. .. By combining the accuracy information 104 in which the weighting factor based on the environment information 105 is reflected, the combiner 107 can realize high accuracy of learning.

101: Sensor 102: Learner 103: Identification result 104: Accuracy information 105: Environmental information 106: Multiplier 107: Synthesizer 108: Estimate result 109: Accuracy information 110: Feedback 111: Environmental information acquisition sensor 112: Learning algorithm Weight coefficient 201: Elapsed time 202: Correct answer rate 203: Correct answer rate when relearning is not performed 204: Correct answer rate when relearning is performed 205: Correct answer rate 301 of Example 1: Feedback 302: Sensor 401: RGB and IR integrated camera 402: Camera controller 403: Programmable signal amplifier 404: RGB and IR integrated image sensor 405: Shutter 406: Aperture

Claims

Multiple learners that input signals from multiple sensors and determine the state based on teacher data,
A synthesizer that synthesizes information output from the plurality of learners and outputs output information including identification results, and a synthesizer.
It has a feedback system that gives the output information output from the synthesizer to the plurality of learners.
The plurality of learning devices are learning devices characterized in that the output information obtained via the feedback system is used as teacher data for learning.
Each of the plurality of learners outputs the identification result and the accuracy information, respectively.
The learning device according to claim 1, wherein the combiner performs a combining process based on the identification result and the accuracy information output from the plurality of learning devices, and outputs an identification result and accuracy information.
The learning device according to claim 2, wherein the plurality of learning devices learn based on the identification result and the accuracy information provided via the feedback system.
The learning device according to claim 2, wherein each of the plurality of learning devices has a weighting coefficient and updates the weighting coefficient according to the identification result and the accuracy information provided via the feedback system.
The learning device according to claim 2, wherein the synthesizer has a weighting coefficient corresponding to the plurality of learning devices, and synthesizes output information from the plurality of learning devices according to the weighting coefficient.
The learning device according to claim 5, wherein the synthesizer feeds back an error between the identification result and the learning data, which is its own output, and updates the weighting coefficient by learning.
It has an application unit that applies environmental information obtained from an external sensor to the plurality of accuracy information output from the plurality of learners.
The learning device according to claim 2, wherein the synthesizer performs a synthesis process using the accuracy information weighted by the environmental information at the application unit.
The learning device according to claim 2, wherein the synthesizer synthesizes each identification result and accuracy information input from the plurality of learning devices by using a majority vote.
The synthesizer outputs identification results and accuracy information (first output information) and control information for the plurality of sensors (second output information).
The second output information is provided to the plurality of sensors via the second feedback system.
The learning device according to claim 2, wherein the plurality of sensors control their own sensors using the second output information obtained via the second feedback system.
The second output information includes the exposure of the RGB camera and the gain of the IR camera.
The control unit controls the gain and aperture of each RGB image and IR image by using the exposure and the gain.
An RGB / IR integrated camera using the learning device according to claim 9.
Multiple learners that input signals from multiple sensors, determine the state based on teacher data, and output identification results and accuracy information.
A synthesis process is performed based on the identification results and accuracy information output from the plurality of learners, and the first output information including the identification results and the accuracy information and the control information for the plurality of sensors (second output). A synthesizer that outputs information) and
It has a second feedback system that provides the second output information to the plurality of sensors.
The learning device is characterized in that the plurality of sensors control their own sensors using the second output information obtained via the second feedback system.
Multiple learners that input signals from multiple sensors, determine the state based on teacher data, and output identification results and accuracy information.
A synthesizer that performs synthesis processing based on the identification results and accuracy information output from the plurality of learners and outputs output information including the identification results and accuracy information.
It has an application unit that applies environmental information obtained from an external sensor to the plurality of accuracy information output from the plurality of learners.
The synthesizer is a learning device characterized in that the application unit performs a synthesis process using the accuracy information weighted by the environmental information.
A step in which a plurality of learning devices input signals from a plurality of sensors, a state is determined by teacher data, and an identification result and accuracy information are output.
A step in which the synthesizer synthesizes information output from the plurality of learners and outputs output information including the identification result.
A step of feeding back the output information output from the synthesizer and providing the output information to the plurality of learners.
A step in which the plurality of learners learn using the feedback output information as teacher data.
A learning method characterized by having.