EP4374335A1

EP4374335A1 - Electronic device and method

Info

Publication number: EP4374335A1
Application number: EP22751033.6A
Authority: EP
Inventors: Stefaan VERSCHUERE
Original assignee: Sony Depthsensing Solutions NV SA; Sony Semiconductor Solutions Corp
Current assignee: Sony Depthsensing Solutions NV SA; Sony Semiconductor Solutions Corp
Priority date: 2021-07-19
Filing date: 2022-07-12
Publication date: 2024-05-29
Also published as: WO2023001636A1

Abstract

An electronic device (103) comprising circuitry configured to obtain depth data; perform (504) classification of the depth data by an artificial neural network, ANN, to determine classification information for the depth data, perform an intended functionality if a primary criterion or secondary criterion is fulfilled; determine that the primary criterion for performing the intended functionality is fulfilled based on the classification information obtained by the ANN; determine that the secondary criterion for performing the intended functionality is fulfilled based on a secondary mechanism.

Description

ELECTRONIC DEVICE AND METHOD

TECHNICAL FIELD

The present disclosure generally pertains to the field of artificial intelligence, in particular to operation and training of an artificial neural network, and in particular to production and storage of training data for artificial neural networks. The present disclosure may for example be applied in an automotive context and with respect to depth images obtained by a Time-of-Flight camera.

TECHNICAL BACKGROUND

A Time-of-Flight (ToF) sensor is a range imaging camera system that determines the distance of objects by measuring the time of flight of a light signal between the camera and the object for each point of the image. Depth data obtained by a ToF camera may for example be used to in the automotive industry in several applications like in-cabin-monitoring (ICM), face identification for access control or engine starting, gesture recognition for control of the car infotainment system etc. The depth data obtained by a ToF camera is typically analyzed by the application with the help of an artificial neural network (ANN). The depth data is typically stored temporarily in a volatile memory to be processed by the ANN. Depth data may occupy a significant amount of memory, for example the raw data image of a 1 Megapixel iToF sensor may occupy 128 Mbit. After the ANN processing, the depth data is typically released from memory to free up working memory.

An ANN is trained e.g. by supervised learning, i.e. a collection of labelled training examples. That means that each depth image (or sequence of depth images) is labelled with a specified classification result to which the depth image should lead. An ANN which is trained with too few or wrongly labelled images may perform sub-optimal at initial deployment and may have so-called false negative failures at operation in an application.

Therefore, it is generally desirable to provide techniques which improve training and operation of ANN and the attaining and storing of training data for the ANN.

SUMMARY

According to a first aspect the disclosure provides an electronic device comprising circuitry configured to obtain depth data; perform classification of the depth data by an artificial neural network, ANN, to determine classification information for the depth data, perform an intended functionality if a primary criterion or a secondary criterion is fulfilled; determine that the primary criterion for performing the intended functionality is fulfilled based on the classification information obtained by the ANN; determine that the secondary criterion for performing the intended functionality is fulfilled based on a secondary mechanism; determine, based on the secondary mechanism and the classification information obtained by the ANN. According to a further aspect the disclosure provides a method obtaining depth data; performing classification of the depth data by an artificial neural network, ANN, to determine classification information for the depth data, performing an intended functionality if a primary criterion or a secondary criterion is fulfilled; determining that the primary criterion for performing the intended functionality is fulfilled based on the classification information obtained by the ANN; determining that the secondary criterion for performing the intended functionality is fulfilled based on a secondary mechanism; determining, based on the secondary mechanism and the classification information obtained by the ANN, if the classification of the depth data is false negative.

Further aspects are set forth in the dependent claims, the following description and the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments are explained by way of example with respect to the accompanying drawings, in which:

Fig. 1 schematically shows an embodiment of a system for obtaining and storing ToF sensor depth images and using it for training and operation of an ANN in an automotive environment;

Fig.2 schematically shows a layered architecture of an automotive software platform;

Fig. 3 shows an exemplifying architecture of a convolutional neural network for image classification;

Fig. 4 schematically shows a sliding window operation for determining depth data classified as false negative by an ANN;

Fig. 5 shows a flowchart of capturing false negative depth data with respect to an intended functionality;

Fig. 6 schematically shows s feedback loop of capturing false negative classified ToF depth data and a re-training of an ANN with respect to an intended functionality;

Fig. 7 is a block diagram depicting an example of schematic configuration of a vehicle control system as an example of a mobile body control system to which the capturing of false negative depth data with respect to an intended functionality can be applied; and

Fig. 8 is a diagram explaining an example of installation positions of an outside-vehicle information detecting section and an imaging section.

DETAILED DESCRIPTION OF EMBODIMENTS

The embodiments disclose an electronic device comprising circuitry configured to obtain depth data; perform classification of the depth data by an artificial neural network, ANN, to determine classification information for the depth data, perform an intended functionality if a primary criterion or secondary criterion is fulfilled; determine that the primary criterion for performing the intended functionality is fulfilled based on the classification information obtained by the ANN; determine that the secondary criterion for performing the intended functionality is fulfilled based on a secondary mechanism; determine, based on the secondary mechanism and the classification information obtained by the ANN, if the classification of the

The circuitry of the electronic device may comprise one or more microprocessors, microcontrollers, one or more electronic control units (ECUs), communication buses (e.g. CAN, Flexray, Ethernet), system-on-chip (SoC) technology, FPGA technology, GPU technology (or a GPU optimized for ANN processing), and/or an imaging sensor of an imaging camera, in particular a sensor of a ToF camera. The circuitry of the electronic device may also include electronic components such as switching elements (gates, transistors, etc.), resistors, memory elements (capacitors, volatile memory like RAM/ SDRAM, non-volatile memory like ROM or the like), pixel circuitry, a storage, input means (mouse, keyboard, camera, etc.), output means (display (e.g. liquid crystal, (organic) light emitting diode, etc.), loudspeakers, etc., a (wireless) interface, etc., as it is generally known for electronic devices (computers, smartphones, etc.). Moreover, it may include sensors for sensing still image or video image data (image sensor, camera sensor, video sensor, etc.), for sensing a fingerprint, for sensing environmental parameters (e.g. radar, humidity, light, temperature), etc.

According to the embodiments, the circuitry may be configured to re-train the ANN if the depth data is classified as false negative.

According to the embodiments, the circuitry may be configured to store the obtained depth data on a volatile memory. A volatile memory may be a computer memory that requires power to maintain the stored information and that retains its contents while powered on but when the power is interrupted, the stored data is lost. A volatile memory memory may for example be a general- purpose random-access memory (RAM), a double data rate (DDR) synchronous dynamic random- access memory (SDRAM), or the like.

According to the embodiments, the circuitry may be configured to store the depth data on a non volatile memory if the depth data is classified as false negative. A non-volatile memory may be any type of computer memory that can retain stored information even after power is removed.

Examples of non-volatile memory include flash memory, read-only memory (ROM), ferroelectric RAM, most types of magnetic computer storage devices (e.g. hard disk drives, floppy disks, and magnetic tape), optical discs, and computer storage methods such as paper tape and punched cards.

The depth data may for example comprise one or more images comprising depth information like raw depth data from a ToF sensor, a photon count value image, an amplitude/ confidence map a depth map or the like, and it may also comprise sequence of depth images. For an iToF sensor, each pixel may generate eight individual raw data values (4 phases, 2 Taps/ pixel) per image. If the iToF sensor comprise signal processing capabilities it may convert this raw data into 1/ Q values (phase, amplitude information) and then into depth/ confidence information for the image. For a dToF sensor the raw data may consist of a photon count value. If the dToF comprises signal processing capabilities, it may convert photon count value into a depth map.

The intended functionality may relate to any application, to any application in an automotive context. For example, the intended functionality may relate to opening a door lock of a car.

The intended functionality may be performed if a primary or secondary criterion is fulfilled.

A primary criterion for performing the intended functionality may be based on the classification information obtained by the ANN. It may for example be determined that a primary criterion for performing the intended functionality is fulfilled based on classification information obtained by the ANN. For example, according to the embodiments the ANN may be configured to perform face identification, and the intended functionality may relate to opening a door lock of a car. In this example, a primary criterion for performing the intended functionality (e.g. opening the door lock) may relate to the classification information obtained by the face identification.

The classification information obtained by the ANN is here referred to as a primary mechanism or a fist mechanism to determine if the intended functionality is to be performed.

A secondary mechanism may be any other mechanism which is suitable to determine, if the intended functionality is to be performed.

A secondary criterion may be based on a secondary mechanism. It may for example be determined that the secondary criterion for performing the intended functionality is fulfilled based on a secondary mechanism. For example a secondary criterion for performing the intended functionality may be the opening of door lock with a key.

According to the embodiments the circuitry may be configured to determine that the classification of the depth data is false negative if the circuitry determines that the secondary criterion for performing the intended functionality is fulfilled based on the secondary mechanism within a predetermined time span after the circuitry determined that the primary criterion for performing the intended functionality is not fulfilled based on the classification information. For example, the depth data may be considered false negative if the primary criterion (ANN based) for opening the door lock by face recognition yields is not fulfilled (e.g. in case that a car owner is not correctly recognized by the ANN as authorized person despite being registered as authorized person). In such a situation, the circuity may recognize the false negative classification if the car owner uses a key to unlock the door within a predetermined time after the false negative classification of his face by the ANN. According to the embodiments the classification information may comprise a confidence value related to the intended functionality, and the primary criterion for performing the intended functionality may be fulfilled if the confidence value is above a predetermined acceptance threshold. For example, an ANN may be trained to provide two output classes, namely authorized and not authorized. If a depth image is presented to this ANN, it will provide a first confidence value (or probability) for the depth data indicating a authorized person and a second confidence value (or probability) for the depth data indicating a unauthorized person.

The ANN may be trained with respect to a certain intended functionality. The ANN may for example be of the classification type and may provide by its output layer probabilities or confidence values for a predefined number of classification results, named here also classes. A class which is linked to the intended functionality is here referred to as intended class. That is, if the depth data is classified by the ANN into this intended class (the classes respective confidence value exceeds a predetermined acceptance threshold), then it may be determined that a primary criterion for performing the intended functionality is fulfilled, and the intended functionality is performed.

The confidence value related to the intended functionality may refer to the confidence value of an intended class.

According to the embodiments the circuitry may be configured to keep the obtained depth data in the volatile memory for a predetermined time span after performing classification on the depth data by the ANN if the confidence value of the depth data for the intended class is below the predetermined acceptance threshold and above a predetermined monitoring threshold.

According to the embodiments the circuitry may be configured to store a label indicating a class related to the intended functionality together with the corresponding depth data as labeled depth data on the non-volatile memory.

According to the embodiments the circuitry may be configured to re-train the ANN based on the labeled depth data stored on the non-volatile memory.

According to the embodiments the depth data may be based on an image generated by a Time-of- Flight sensor.

According to the embodiments the intended functionality may relate to an automotive environment. Automotive environment means that the system may be used in any be an kind of automobiles, electric vehicles, hybrid electric vehicles, motorcycles, bicycles, personal mobility vehicles, airplanes, drones, ships, robots, construction machinery, agricultural machinery (tractors), trucks, space vehicles, rail vehicles, water vehicles or the like. According to the embodiments the circuitry may be configured to determine that, with respect to the secondary mechanism, the (secondary) criterion for opening the door lock is fulfilled if the key is used to manually unlock the car door.

With respect to the secondary mechanism may mean the secondary mechanism is using a key to manually unlock the car door.

According to the embodiments the ANN may be configured to perform gesture recognition, and wherein the intended functionality relates to increasing a volume of an audio system in a car, and wherein the secondary criterion for performing the intended functionality is a criterion for increasing the volume.

According to the embodiments the circuitry may be configured to determine that, with respect to the secondary mechanism, the secondary criterion for increasing the volume is fulfilled if the a knob or touch screen on the car’s console is operated manually to increase the volume.

With respect to the secondary mechanism may mean the secondary mechanism is operating a knob or touch screen on the car’s console manually to increase the volume.

According to the embodiments the ANN may be a convolutional neural network.

According to the embodiments the ANN may be implemented as a software or as a hardware.

The embodiments disclose further a method obtaining depth data; performing classification of the depth data by an artificial neural network, ANN, to determine classification information for the depth data, performing an intended functionality if a primary criterion or a secondary criterion is fulfilled; determining that the primary criterion for performing the intended functionality is fulfilled based on the classification information obtained by the ANN; determining that the secondary criterion for performing the intended functionality is fulfilled based on a secondary mechanism; determining, based on the secondary mechanism and the classification information obtained by the ANN, if the classification of the depth data is false negative.

The embodiments are now described in more detail with reference to the accompanying drawings.

Fig. 1 schematically shows an embodiment of a system for obtaining and storing ToF sensor depth images and using it for training and operation of an ANN in an automotive environment. The system 100 for obtaining and storing ToF sensor depth images and using it for training and operation of an ANN may be implemented in a car. A ToF Sensor 101 may be an indirect ToF (iToF) sensor or a direct ToF (dToF) sensor or the like. The ToF sensor 101 consists of an array of pixels, wherein each pixel outputs one or more analog voltage values, that are converted via an Analog-to-Digital-Convertor to digital values with a certain resolution which is expressed in a number of bits, for example 10 bits or 12 bits. This digital data is called raw data. The ToF sensor 101 comprises processing capabilities and converts the raw data into a depth map of the recorded scene. The depth map from the ToF sensor 101 is transmitted via the interface 102 to the host processor 103. The interface 102 may be a MIPI standardized interface, for example a camera serial interface (CSI) or it may be a I2C interface. The host processor 103 may be a CPU or a GPU or an FGPA or any other processor. The host processor 103 is connected with a volatile memory 104 via the interface 105 which is a Direct Memory Access (DMA) interface. The volatile memory 104 is a SDRAM memory and the host processor 103 stores the received depth image in the SDRAM 104. On the host processor 103 a software algorithm which executes the ANN algorithm, that is the instructions are executed on the depth map by using its trained coefficients (stored in the SDRAM 105 or the NVM 106). That means the ANN may performs inference (for example a classification) on the depth map (or a sequence of depth images) within the car (for example face identification for opening a door of the car, gesture identification of change the volume of the audio system, driver drowsiness detection or the like) and output its result as a confidence value for further processing (for example send the result to a ECU 108). The host processor 103 is further connected to a non volatile memory (NVM) 106 via an interface 107, which is a Queued Serial Peripheral Interface (QSP1). The host processor 103, the volatile memory 104 and the non-volatile memory 106 may be part of a microcontroller or a system-on-chip (SoC) 110. The host processor 103 is still further connected to an engine control unit (ECU) 108 via an interface 109, which is Controller Area Network bus (CAN). The ECU may be connected to different sensors and actuators within the car and controls several processes within the car (e.g. opening the lock of a door of the car, increase the volume of the audio system or the like). The host processor 103 may receive information (for example the execution/ triggering of a secondary mechanism, see below) from the ECU 108 about the sensors/ actuators and the like (e.g. the start engine knob press, or the key is put into the lock to unlock it, or the volume knob was of the audio system was operated etc.). The ECU 108 may receive inference results of the ANN from the host processor 103.

In another embodiment a sequence of depth maps or the like may be processed by the ANN.

All the different types of depth information for a single depth image (raw data, photon count value, amplitude/ confidence map, depth map etc.) or a sequence of depth images will be referred to as depth data.

If the ToF sensor 101 does not comprise signal processing capabilities the host processor 103 may receive the raw data from the ToF sensor and determine the depth data. Some applications may dependent on motion, that is multiple frames need to be processed simultaneously. In an automotive environment (for example a car) certain actions (may be implemented as applications, see Fig. 2) with a respective intended functionality may be performed (by a the host processor 103 or by a car ECU), like for example opening a lock of a car door, starting the car engine of a car, increasing the volume of an audio system in a car, apply the break of a car or wake up a drowsy driver of a car (by a sending out a stimulus signal). These actions/ intended functionalities may be performed based on the fulfillment different criterions. The fulfillment of a first criterion may be determined based on the classification information of the ANN (also referred to as a primary/ first mechanism). The fulfillment of a secondary criterion may be determined based on a secondary mechanism. The secondary mechanism may be the manually triggering of the intended function (see detailed description below).

The CSI is a specification of the Mobile Industry Processor Interface (MIP1) Alliance. For example, CSI-2 or CSI-3 or CCS version may be used.

In another embodiment the volatile memory 104 may be DRAM, SRAM, SGRAM.

The non-volatile memory (NVM) 106 may be for example a flash memory, a read-only memory (ROM), ferroelectric RAM, a magnetic computer storage device like (e.g. a hard disk drives, floppy disks, and magnetic tape) or an optical disc.

In another embodiment the interface 107 between the host processor 103 and the NVM 106 may be another type of Serial Peripheral Interface (SPI) or a parallel interface or the like.

In another embodiment the interface 109 between the host processor 103 and ECU 108 of the car may be an Ethernet interface, a Local Interconnect Network bus (LIN), a FlexRay interface, a Serial Peripheral Interface (SPI), an interrupt request (IRQ) or the like.

In another embodiment the host processor 103 and the volatile memory may be part of a microcontroller or a system-on-chip (SoC) 110 and the non-volatile memory may be connected to the microcontroller or a system-on-chip (SoC) 110 via an I/O interface of the microcontroller or a system-on-chip (SoC) 110.

The ANN may for example be a convolutional neural network (CNN), a multi-layer perception (MLP), a Recurrent Neural Networks (RNN), a support vector machine (SVM), an autoencoder, a Generative adversarial network (GAN), a deep neural network (DNN) or the like.

In another embodiment the ANN may be implemented as a hardware instantiation (e.g. GPU) which essentially executes the same algorithm as a software ANN but implemented in an IC.

In an automotive scenario, the communication between ToF sensor, processor, ANN, door lock and the like may be abstracted from the hardware by software layers. Fig. 2 schematically shows a layered architecture of an automotive software platform. An example of an automotive software platform such as schematically shown in Fig. 2 is for example described in “AUTOSAR, Layered Software Architecture, Classic Platform, Standard 4.3.1” in more detail.

An automotive software platform comprises a microcontroller/hardware layer 201, a basic software layer 202, a runtime environment layer 203 and an application layer 204. The different hardware components of the microcontroller/hardware layer 201 like a ToF sensor (101 in Fig. 1), ECUs/ processors (103 in Fig. 2), SDRAM (104 in Fig. 1) and NVM (106 in Fig. 1) or the like may relate to a distributed system. The basic software layer 202 provides several types of services for example: Input/ Output services which standardized the access to sensors, actuators and ECU onboard peripherals; Memory services which standardized the access to internal (e.g. SDRAM) / external memory (e.g. NVM); Crypto services which standardized the access to cryptographic primitives including internal/ external hardware accelerators; Communication services which standardized the access to vehicle network systems, ECU onboard communication systems and ECU internal software; Off-board communication services which standardized access to Vehicle-to-X communication, in vehicle wireless network systems, ECU off-board communication systems; System services, which provide provision of standardizeable (operating system, timers, error memory) and ECU specific (ECU state management, watchdog manager) services and library functions; Driver services which contain the functionality to control and access an internal or an external device.

The runtime environment layer 203 realizes the communication between the basic software layer 203 and different software components of the application layer 204. That is runtime environment layer 203 manages the inter- and intra-hardware (ECU) communication. The application layer 204 comprises several different applications with different intended functionalities to support vehicle functions, like for example an application for opening a lock of a door of the car, or an application for changing the volume of the audio system in the car or an application for sending out of a stimulus signal to the driver, or the like. Each of the applications may be activated (perform their intended functionality) based on the fulfillment of different criterions. For example, the fulfillment of a first criterion may be determined based on a classification information of an ANN for face identification (start engine, open lock of car door), of an ANN for gesture identification (changing the volume), of an ANN for driver drowsiness detection (sending out stimulus signal) or of an ANN for a break assistance (breaking). The different ANN, which are trained for different intended functionalities, may be implemented as an application within the application layer.

The runtime environment layer 204 defines so called ports and from the point of view of the application layer 204 devices such as sensors, controllers, ECUs, and the like are accessible as software components which communicate with each other through respective ports. For example, viewed from the application layer 204 the ToF camera is accessible as software component where the internal functionality of the operating system or the communication protocol stack are largely hidden.

Therefore, the ToF depth data may be received by the ANN(s) through a port provided by the runtime environment layer 203. The ANN may then perform inference (for example classification) on the depth data and output its results, for example the confidence value of the intended classes (see Figs. 3 and 4), through a port to another application with a certain intended functionality.

Inference processing of the ANN

The Depth Data is sent from the ToF sensor 101 to an ANN instantion via the MIPI interface 102, stored in SDRAM 104 and processed by the ANN. The ANN performs inference on the depth data and determines for example classification information.

As described above, in an automotive environment certain actions with a respective intended functionality may be performed (for example by a car ECU or the host processor 103). These intended functionalities may for example be opening a lock of a car door, starting the car engine of a car, increasing the volume of an audio system in a car, apply the break of a car or sending out a stimulus signal to a drowsy driver to wake him up. These intended functionalities may be performed (for example by the ECU or the host processor 103) based on the fulfillment of one of a plurality of possible criterions. The fulfillment of a first criterion may be determined based on the classification information determined by the ANN.

That means, each ANN may be trained with respect to a different intended functionality within the car. For example, the performing of the intended functionality of starting the car engine or open the lock of the car door may be based on fulfillment of a first criterion based on a classification information of an ANN for face identification. The ANN for face identification may identify a depth image of a face of a person. For example, the performing of the intended functionality of increasing the volume of the audio system may be based on fulfillment of a first criterion based on a classification information of an ANN for gesture identification. The ANN for gesture identification may identify a sequence of depth images of hand of a person. For example, the performing of the intended functionality of sending out stimulus signal may be based on fulfillment of a first criterion based on a classification information of an ANN for driver drowsiness detection. The ANN for driver drowsiness detection may perform image analysis/ recognition of depth images of the face/ facial expression of the driver. For example, the performing of the intended functionality of breaking may be based on fulfillment of a first criterion based on a classification information of an ANN for a break assistance. The ANN for a break assistance may analy2e depth images preceding vehicles. The ANNs which are trained for different intended functionalities, may be implemented as an application within the application layer.

The ANN may for example perform a classification of the depth data (or a sequence of depth images) into one of a number of classes. That is the ANN performs classification of the depth data and determines classification information for the depth data. The classification information for the depth data that is determined by the ANN may be a certain value for each class for the depth data (see for example the Softmax function for the CNN in Fig. 3). The value of each class is also referred to as confidence value (or confidence level) of the class for the depth data. That is the classification information for the depth data may be a confidence value for each class for the depth data. The class with the highest confidence value may indicate into which class the depth data is classified. The class into which the depth data may be classified may be referred to as classification result.

Each ANN - trained with respect to a certain intended functionality - may provide an output layer with classes, where a specific class is linked to the intended functionality. Such a class is referred to here as “class related to the intended functionality”, or “intended class”. This means that if the depth data is classified into an intended class (the probably and its respective confidence value exceeds a predetermined acceptance threshold Thr_Acc, see also Fig. 4), then it is determined that the first criterion is fulfilled and the intended functionality is performed. An intended class may for example be the authorized-class (open lock of car door), the increase-volume-class (gesture recognition for increasing volume), the break-class (break assistance), the drowsy-class (drowsiness detection).

In case that a sequence of depth images (for example for gesture recognition or the like) is classified, the ANN may determine a confidence value for each class for the whole sequence of depth images as a whole.

An example of an ANN is a convolutional neural network (CNN) which is exemplarily described next in Fig. 3.

Fig. 3 shows an exemplifying architecture of a convolutional neural network for image classification. An input image matrix 301 is input into the CNN, wherein each entry of the input image matrix 301 corresponds to one pixel of an image (for example the depth image of a face), which should be processed by the CNN. The value of each entry of the input image matrix 301 may be a 32-bit value, wherein each of the colours red, green, and blue and the depth information occupies 8 bits. In another embodiment the value of each entry of the input image matrix 301 may be a 16-bit value, wherein a grayscale value and the depth information respectively occupy 8 bits. A filter (also called kernel or feature detector) 302, which is a matrix (may be symmetric or asymmetric; in audio applications, it may be advantageous to use asymmetric kernels as the audio waveform - and therefore also the depth data— may be not symmetric) with an uneven number of rows and columns (for example 3x3, 5x5, 7x7 etc.), is shifted from left to right and top to bottom such that the filter

302 is once centred over every pixel. At every shift the entries of the filter 302 are elementwise multiplied with the corresponding entries in the image matrix 301 and the result of all elementwise multiplication are summed up. The result of the summation generates the entry of a first layer matrix

303 which has the same dimension as the input image matrix 301. The position of the centre of the filter 302 in the input image matrix 301 is the same position where the generated result of the multiplication-summation as described above is placed in the first layer matrix 303. All rows of the first layer matrix 303 are placed next to each other to form a first layer vector 304. A nonlinearity (e.g., ReLU) may be placed between the first layer matrix 303 (convolutional layer) and the first layer vector 304 (affine layer). The first layer vector 304 is multiplied with a last layer matrix 305, which yields the result z. The last layer matrix 305 has as many rows as the first layer vector has columns and the number of 5 columns of the last layer vector corresponds to the 5 different classes into which the CNN should classify the input image matrix 301. The result z of the matrix multiplication between the first layer vector 304 and the last layer matrix 305 is input into a Softmax function. The

Softmax function is defined as <j(z)_j = with i = 1, ··· , 5, which yields a probability ._j=i ^{e 1} distribution over the S classes, i.e. the probability for each of the S different classes into which the CNN should classify the input image matrix 301. The probability value for each class may also be called the confidence value of the class for the image (i.e. for the depth data).

For example, 5 = 2, i.e. the depth data image corresponding to the input image matrix 301 should be classified into two classes, class_l and class_2. This yields the probability P_ciass_ 1 that the input image matrix 301 belongs to class class_l and the probability P_ciass_ 2 that the input image matrix 301 belongs to class class_2 (for binary classification problems, i.e. S = 2, only one output neuron with a sigmoid nonlinearity may be used and if the output is below 0.5 the input may be labelled as class 1 and if it is above 0.5 the input may be labelled as class 2).

The entries of the filter 302 and the entries of the of the last layer matrix 305 are the weights (coefficients) of the CNN which are trained during the training process. The CNN can be trained in a supervised manner, by feeding an input image matrix, which is labelled as corresponding to a certain class, into the CNN. The current output of the CNN in the training phase is input into a loss function and through a backpropagating algorithm the weights of the CNN are adapted. There exist several variants of the general CNN architecture described above. For example, multiple filters in one layer can be used and/ or multiple layers can be used. For example, the ANN (e.g. a CNN) may perform face identification, that is it classifies depth data of a face of a person by into one of two different classes - that is either the authorized-class or non- authorized-class. The ANN was trained with respect to the intended functionality of opening the lock of a car door. The criterion for performing the intended functionality may be fulfilled if the depth data is classified into the authorized-class with a confidence value (for example in case of a normalized softmax function) above the acceptance threshold Thr_Acc=0.9.

In another example, the ANN (e.g. a CNN) may perform driver drowsiness detection, that is it classifies the depth data (for example of a face of a person) into one of three classes - that is the sleeping-class (driver is sleeping), attentive-class (driver is attentive), and drowsy-class (driver is drowsy). The ANN was trained with respect to the intended functionality of sending out a stimulus signal (e.g. flashing the interior lights in the car) to stop the driver from falling asleep/ waking the sleeping driver up. The criterion for performing the intended functionality may be fulfilled if the depth data is classified into the drowsy-class with a confidence value (for example in case of a normalized softmax function) above the acceptance threshold Thr_Acc=0.7.

In another example, the ANN (e.g. a CNN) perform gesture recognition for increasing (or decreasing) audio volume, that is it classifies sequence of depth images (for example a hand gesture of a person) into on of two classes - that is the increase-volume-class, and decrease-volume-class. The ANN may be trained with respect to the intended functionality of increasing (decreasing) the volume of the audio system in the car. The criterion for performing the intended functionality may be fulfilled if the depth data is classified into the increase-volume-class with a confidence value (for example in case of a normalized softmax function) above the acceptance threshold Thr_Acc=0.6.

Determination of false negatives

After the ANN carried out the inference (for example a classification) on the depth data (or sequence of depth images) the host processor 103 may release the depth data (or sequence of depth images) from SDRAM 104 to free up working memory (for example one iToF raw depth frame from a 1 Megapixel sensor may occupy 128 Mbits in the SDRAM which is organized in multiples of 8-bits).

However, in order to improve the training of an ANN with respect to a certain intended functionality, it is desirable to attain a large database of labelled training data (i.e. a depth data from the ToF sensor with the corresponding class it belongs to). According to the embodiments described below in more detail, upon false negative detection, knowledge about lase negative results is used to produce new training data for re-training the ANN. A desired action in a car (which may be implemented as applications, see Fig. 2) with a respective intended functionality like for example opening a lock of a car door, starting the car engine of a car, increasing the volume of an audio system in a car, apply the break of a car or wake up a drowsy driver of a car (by a sending out a stimulus signal) may be performed by a the host processor 103 or by a car ECU. These actions/intended functionalities may be performed based on the fulfillment of a first criterion as described above. That is the first criterion may be determined as fulfilled based on the classification information of the ANN. However, these actions/intended functionalities may also be performed based on the fulfillment of a secondary criterion. This secondary criterion may be determined as fulfilled based on a secondary mechanism. The secondary mechanism may be the manually triggering of performing of the intended functionality or the triggering of the intended functionality by an alternative application.

For example, for the intended functionality of opening a lock of a door, the secondary mechanism may be the using of the key to unlock the car door. The opening of the lock of the car door may be performed if the secondary criterion is determined as fulfilled based on using of the key to unlock the car door. Still further, for the intended functionality of increasing the volume of the audio system, the secondary mechanism may be operating the knobs or touch screen on the car’s console to increase the volume. The increasing the volume of the audio is performed if the secondary criterion is determined as fulfilled based on operating the knobs or touch screen on the car’s console. Still further, for the intended functionality of breaking the car, the secondary mechanism may be operating of the brake pedel (manually). The breaking is performed if the secondary criterion is determined as fulfilled based on operating of the braking pedal. Still further, the secondary mechanism may be triggering the intended functionality by an alternative application. For example, for the intended functionality of sending out a stimulus signal (as also be the driver drowsiness detection) the secondary mechanism may be output signal of a lane assistant (for example if a lane is unusually crossed). The sending out of a stimulus signal to the driver is performed if the secondary criterion is determined as fulfilled based on the output signal of the lane assistant.

It may be determined that the secondary criterion for performing the intended functionality is fulfilled based on a secondary mechanism if the secondary mechanism is activated/ triggered/ executed.

Still further, the confidence value of the intended class of the depth data may be below the acceptance threshold Thr_Acc (such that the intended functionality is not reached) but still above a predetermined monitoring threshold Thr_Mon (see Fig. 4). The predetermined monitoring threshold Mon_Acc may be a small predetermined amount below the acceptance threshold Thr_Acc, for example the monitoring threshold Mon_Acc may be 1% or 5% or 10% below the acceptance threshold Thr_Acc. This means that the depth data is classified as “just narrowly” not within the intended class. That means, that the first criterion for performing the intended functionality is not fulfilled based on the classification (outside intended class) for theses depth data samples. If however, the secondary criterion based on the secondary mechanism is determined as fulfilled (i.e. the secondary mechanism is triggered) within a short time span after it was determined that the first criterion is not fulfilled based there is high probability that the depth data is a false negative. That means, that the depth data is classified as not belonging to the intended class by the ANN (which would have resulted in performing the intended functionality based on the ANN criterion), although it is a member of that intended class (that is the intended functionality should have of been performed based on the ANN). Below (see Fig. 4) it is described how to determine if a depth data is classified false negative based on the secondary mechanism and the confidence value of the depth data.

These depth data samples together with their corresponding labels may be stored in the NVM as training data. Therefore, it is desirable when operating an ANN trained with respect to an intended functionality, to flag interesting candidates of depth data upfront which may be later turn out to be false negatives and not to release them from the volatile memory 104 for a certain time span after processing. The time span (see Fig. 4 “sliding window operation”) during which these depth data should not be released from the volatile memory 104 should be a span short enough to not use up more memory than necessary but long enough that most likely (for example with 90% or 95% or 98% certainty ) the secondary mechanism for a certain application is triggered/ executed, if it is triggered/ executed at all. During this time span the relevant depth data is kept in the volatile memory 104 (instead of releasing them immediately after the ANN has processed them). If in this time span, there occurs the activation of the secondary mechanism, the interesting depth data is still available in the volatile memory 104 and it can be written to a non-volatile memory 106 from the volatile memory 104, together with its corresponding class label.

Fig. 4 schematically shows a sliding window operation for determining a depth data classified false negative by the ANN. The graph 300 shows the time on the x-axis and the confidence value for the intended class of a depth data which is classified by a ANN (for example performing face identification for opening a door of the car, performing gesture identification for increasing the volume of the audio system, performing driver drowsiness detection or the like) on the y-axis. The ToF depth data (or sequence of depth images) 403-1,...,403-6 are received by the ANN from an ToF sensor over time, and classified by determining a confidence value (for each available class) for each depth data (or by determining one confidence value for each sequence of depth images). If the confidence value of the intended class for the application of a depth data 403 exceeds a predetermined acceptance threshold Thr_Acc 401 the classification of the depth data 403 is used for further processing, as for example for 403-1. If the confidence value of the intended class of a depth data 403 is below a predetermined monitoring threshold Mon_Acc 402 the depth data 403 is not used for further processing and the depth data is released from the SDRAM 104 immediately, as for example for 403-2, 403-3. A predetermined monitoring threshold Mon_Acc 402 is defined as a confidence value for the intended class of the ANN which is a predetermined amount below the acceptance threshold Thr_Acc 401, for example 1% or 5% or 10% below the acceptance threshold Thr_Acc 401. The depth data 403-4 is received at time ti, and has a confidence value for the intended class below the acceptance threshold Thr_Acc 401 (i.e. the first criterion for performing the intended functionality is not fulfilled based on the classification result for the depth data) but above the monitoring threshold Mon_Acc 402. That means the depth data 403-4 is an interesting candidate which may later turn out as false negative classified depth data. Therefore, the depth data 403-4 is not immediate released from the SDRAM 104. Instead a sliding window operation 404 is activated with a time span of Tsw (sliding window time). That means during this time span Tsw, i.e. from the time ti until the time ti+Tsw, the depth data 403-4 is kept in the SDRAM 104. If during the time span Tsw, that is at a time t £ ti+Tsw,the second criterion based in the secondary mechanism for performing the intended functionality is fulfilled the depth data 403-4 is transferred from the SDRAM 104 to the non-volatile memory 106 together with its corresponding label stemming from the secondary mechanism. The corresponding label is the class to which the depth data correcdy belongs, which is the intended class which is indicated by the fulfilment of the secondary criterion based in the secondary mechanism. The secondary mechanism is executed for the depth data 403-4 at time t , which satisfies t £ ti +Tsw, therefore the depth data 403-4 together with its corresponding label from the secondary mechanism is transferred to the non-volatile memory 106.

The time ti may be the time at which the depth data is generated at the ToF sensor 101 or the time at which the inference on the depth data in ANN is finished or a time in between. However, since the time span between the generation of the depth data in the ToF sensor and the finishing of the inference of the ANN on the depth data is very small compared to the time span Tsw, this may be not important.

The monitoring threshold Mon_Acc 402 should be a small amount the below acceptance threshold Thr_Acc 401, for example between 1-15%, such that interesting candidates are identified without storing to many depth data which in the SDRAM 104 which later not turn out to be classified as false negatives. The monitoring threshold Mon_Acc 402 is needed to limit the amount of SDRAM working memory 104 for the sliding window because it may be not possible to just store each depth data for a longer time span after it is processed. The time span Tsw of the sliding window during which a depth data should not be released may be a time span within which most likely (for example 95% certainty) the secondary criterion based on the secondary mechanism for the intended functionality is fulfilled, if it is fulfilled at all. Therefore, the time span of the sliding window varies depending on the respective use case (i.e. the intended functionality) and may vary between Is and 60s. For example, after not performing the intended functionality of starting the car engine or opening the lock of the car door based on the first criterion, the secondary mechanism (using the key to unlock the car door) may be executed within the next 60 seconds, and therefore the time span sliding window Tsw rnay be Tsw = 60s. For example, after not performing the intended functionality of increasing the volume of the audio system, based on the first criterion the secondary mechanism (operating the knobs or touch screen on the car’s console to increase the volume) may be executed within the next 5 seconds and therefore the time span sliding window Tsw may be Tsw = 5s . For example, after not performing the intended functionality of breaking based on the first criterion (break assistance) the secondary mechanism (operating of the brake pedal) may be executed within the next 3 seconds and therefore the span sliding window Tsw may be Tsw = 3s. For example, after not performing the intended functionality of sending out a stimulus signal on the first criterion (drowsiness detection) the secondary mechanism (outputting of the lane assistant) may be triggered within the next 30 seconds and therefore the time span sliding window Tsw rnay be Tsw = 30s.

For example, the ANN for face identification for opening a lock of the door of the car or starting the engine of the car classified a person as not authorized (that is the confidence value of the intended class being above the monitoring threshold Mon_Acc and below the acceptance threshold Thr_Acc), and then the key is used to manually triggering the unlocking a short time period later (see sliding window in Fig. 4). Or the ANN for gesture identification for increasing the volume of the audio system classified the gesture as not-increasing the volume (that is the confidence value of the intended class being above the monitoring threshold Mon_Acc and below the acceptance threshold Thr_Acc) and then knobs or touch screen on the car’s console are operated to increase the volume a short time period later. Or the ANN for the break-assistance classified a depth data of situation in front of the car as not worth breaking (that is the confidence value of the intended class being above the monitoring threshold Mon_Acc and below the acceptance threshold Thr_Acc) and then the brake pedal is operated a short time period later. Or, the ANN of the driver drowsiness detection classifies a depth data of a driver as awake (that is the confidence value of the intended class being above the monitoring threshold Mon_Acc and below the acceptance threshold Thr_Acc) and then the lane assistant outputs a signal (to output a stimulus signal) a short period of time later.

Therefore, these depth data, classified false negative by the ANN together with their corresponding correct labels, i.e. the intended class they belong to (e.g. the authorized-class, the increase-volume- class, the break-class, the drowsy-class etc.) which are delivered by the secondary mechanism should be identified (captured) and stored. Then this labelled depth data may be used as training data for the ANN to improve the operation of the ANN for the intended functionality for images/ people/ use cases not properly covered by the existing training database.

Fig. 5 shows a flowchart of capturing false negative depth data of an ANN with respect to an intended functionality. In step 501, depth data is generated with a ToF sensor at time ti. In step 502, the depth data is stored on the volatile memory 104. In step 503, the depth data is read from the volatile memory into the host processor which carries out the ANN processing. In step 504, classification of the depth data is performed in an ANN of the depth-based ANN application and a confidence value of the intended class is obtained. In step 505, it is asked if the confidence value of the intended class the depth data is between a monitoring threshold and a acceptance threshold Thr_Acc, i.e.Thr_Mon < confidence value < Thr_Acc. If the answer in step 505 is no, the process proceeds with step 506. In step 506, the depth data is drooped from volatile memory and process ends (for example the confidence value may exceed the acceptance threshold and thereby the intended functionality may be performed based on the fulfillment of the first criterion). If the answer in step 505 is yes, the process proceeds with step 507. In this case the intended functionality is not performed because the first criterion is not fulfilled on the classification confidence value of the depth data. In step 507, a sliding window operation (see Fig. 4) is started with a time span of Tsw. That means the during the time span Tsw the depth data should not be released from the volatile memory. In step 508, it is asked if the current time t is below the generation time of the depth data ti plus the time span of Tsw of the sliding window, i.e. t < ti+ Tsw. If the answer in step 508 is no, the process proceeds with step 509. In in step 509, the depth data is released from volatile memory and process ends. If the answer in step 508 is yes, the process proceeds with step 510. In step 510, it is asked if the corresponding secondary mechanism for the application was executed (activated). That means, it is determined if the intended functionality is performed based on the fulfillment of the secondary criterion based on the secondary mechanism. If the answer in step 510 is no, the process proceeds again with step 508 (i.e. the intended functionality is not performed because also the second criterion based on secondary mechanism is not fulfilled). If the answer in step 510 is yes, the process proceeds with step 511. That means it is determined that the intended functionality is performed based on the fulfillment of the secondary criterion based on the secondary mechanism. In step 511, the depth data and corresponding label from the secondary mechanism is stored in a non-volatile memory and the process ends.

The depth data and corresponding label from the secondary mechanism can then be read out from the NVM during service and can be sent to an “ANN training-center”, that is for example a central server infrastructure, or a cloud-based service and the ANN for the application can be re-trained (see Fig. 6).

Fig. 6 schematically shows a feedback loop of capturing a false negative classified ToF depth data and re-training of the ANN with respect to an intended functionality. In step 601, a classification of the depth data is performed with the ANN. Between step 601 and step 602, the depth data and corresponding secondary mechanism label are captured as described above with Fig. 4. In step 602, the captured depth data and its corresponding secondary mechanism label is stored in the non volatile memory. Between step 602 and step 603, the stored depth data and its corresponding secondary mechanism label is sent to an ANN training-center (for example central server infrastructure, or a cloud-based service). In step 603, re-training of the ANN is performed (that is its coefficients are adapted to the new data) with the new depth data and its corresponding secondary mechanism label in the ANN training-center. Between step 603 and step 601, the updated coefficients are loaded into the ANN of the application.

After retraining and re-applying the new coefficients to the ANN, its performance will be better for the whole customer base of the application (for example in in the car), resolving implicit biases or shortcomings in the original training set.

Implementation

Fig. 7 is a block diagram depicting an example of schematic configuration of a vehicle control system 7000 as an example of a mobile body control system to which the capturing of false negative depth data of an ANN with respect to an intended functionality can be applied. The vehicle control system 7000 includes a plurality of electronic control units connected to each other via a communication network 7010. In the example depicted in Fig. 7, the vehicle control system 7000 includes a driving system control unit 7100, a body system control unit 7200, a battery control unit 7300, an outside-vehicle information detecting unit 7400, an in-vehicle information detecting unit 7500, and an integrated control unit 7600 (for example the ECU 108, or the host processor 103). The communication network 7010 connecting the plurality of control units to each other may, for example, be a vehicle-mounted communication network compliant with an arbitrary standard (see interface 109) such as controller area network (CAN), local interconnect network (LIN), local area network (LAN), FlexRay (registered trademark), or the like.

Each of the control units may include: a microcomputer that performs arithmetic processing according to various kinds of programs; a storage section (for example the SDRAM 104, or the NVM 106) that stores the programs executed by the microcomputer, parameters used for various kinds of operations, or the like; and a driving circuit that drives various kinds of control target devices. Each of the control units further includes: a network interface (I/F) (for example the interfaces 102, 105, 107, 109) for performing communication with other control units via the communication network 7010; and a communication I/F for performing communication with a device, a sensor (for example the ToF sensor 101), or the like within and without the vehicle by wire communication or radio communication. A functional configuration of the integrated control unit 7600 illustrated in Fig. 7 includes a microcomputer 7610, a general-purpose communication I/F 7620, a dedicated communication I/F 7630, a positioning section 7640, a beacon receiving section 7650, an in-vehicle device I/F 7660, a sound/image output section 7670, a vehicle-mounted network I/F 7680, and a storage section 7690 (for example the SDRAM 104, or the NVM 106).

The other control units similarly include a microcomputer, a communication I/F, a storage section, and the like.

The driving system control unit 7100 controls the operation of devices related to the driving system of the vehicle in accordance with various kinds of programs. For example, the driving system control unit 7100 functions as a control device for a driving force generating device for generating the driving force of the vehicle, such as an internal combustion engine, a driving motor, or the like, a driving force transmitting mechanism for transmitting the driving force to wheels, a steering mechanism for adjusting the steering angle of the vehicle, a braking device for generating the braking force of the vehicle, and the like. The driving system control unit 7100 may have a function as a control device of an antilock brake system (ABS), electronic stability control (ESC), or the like.

The driving system control unit 7100 is connected with a vehicle state detecting section 7110. The vehicle state detecting section 7110, for example, includes at least one of a gyro sensor that detects the angular velocity of axial rotational movement of a vehicle body, an acceleration sensor that detects the acceleration of the vehicle, and sensors for detecting an amount of operation of an accelerator pedal, an amount of operation of a brake pedal, the steering angle of a steering wheel, an engine speed or the rotational speed of wheels, and the like. The driving system control unit 7100 performs arithmetic processing using a signal input from the vehicle state detecting section 7110, and controls the internal combustion engine, the driving motor, an electric power steering device, the brake device, and the like.

The body system control unit 7200 controls the operation of various kinds of devices provided to the vehicle body in accordance with various kinds of programs. For example, the body system control unit 7200 functions as a control device for a keyless entry system (for example via face recognition with an ANN), a smart key system, a power window device, or various kinds of lamps such as a headlamp, a backup lamp, a brake lamp, a turn signal, a fog lamp, or the like. In this case, radio waves transmitted from a mobile device as an alternative to a key or signals of various kinds of switches can be input to the body system control unit 7200. The body system control unit 7200 receives these input radio waves or signals, and controls a door lock device, the power window device, the lamps, or the like of the vehicle.

The battery control unit 7300 controls a secondary battery 7310, which is a power supply source for the driving motor, in accordance with various kinds of programs. For example, the battery control unit 7300 is supplied with information about a battery temperature, a battery output voltage, an amount of charge remaining in the battery, or the like from a battery device including the secondary battery 7310. The battery control unit 7300 performs arithmetic processing using these signals and performs control for regulating the temperature of the secondary battery 7310 or controls a cooling device provided to the battery device or the like.

The outside-vehicle information detecting unit 7400 detects information about the outside of the vehicle including the vehicle control system 7000 (for example used by the brake assistant). For example, the outside-vehicle information detecting unit 7400 is connected with at least one of an imaging section 7410 and an outside-vehicle information detecting section 7420. The imaging section 7410 includes at least one of a time-of-flight (ToF) camera, a stereo camera, a monocular camera, an infrared camera, and other cameras. The outside-vehicle information detecting section 7420, for example, includes at least one of an environmental sensor for detecting current atmospheric conditions or weather conditions and a peripheral information detecting sensor for detecting another vehicle, an obstacle, a pedestrian, or the like on the periphery of the vehicle including the vehicle control system 7000.

The environmental sensor, for example, may be at least one of a rain drop sensor detecting rain, a fog sensor detecting a fog, a sunshine sensor detecting a degree of sunshine, and a snow sensor detecting a snowfall. The peripheral information detecting sensor may be at least one of an ultrasonic sensor, a radar device, and a LIDAR device (Light detection and Ranging device, or Laser imaging detection and ranging device). Each of the imaging section 7410 and the outside-vehicle information detecting section 7420 may be provided as an independent sensor or device or may be provided as a device in which a plurality of sensors or devices are integrated.

Fig. 8 is a diagram of assistance in explaining an example of installation positions of an outside- vehicle information detecting section and an imaging section. Imaging sections 7910, 7912, 7914, 7916, and 7918 are, for example, disposed at least one of positions on a front nose, sideview mirrors, a rear bumper, and a back door of the vehicle 7900 and a position on an upper portion of a windshield within the interior of the vehicle. The imaging section 7910 provided to the front nose and the imaging section 7918 provided to the upper portion of the windshield within the interior of the vehicle obtain mainly an image of the front of the vehicle 7900. The imaging sections 7912 and 7914 provided to the sideview mirrors obtain mainly an image of the sides of the vehicle 7900. The imaging section 7916 provided to the rear bumper or the back door obtains mainly an image of the rear of the vehicle 7900. The imaging section 7918 provided to the upper portion of the windshield within the interior of the vehicle is used mainly to detect a preceding vehicle, a pedestrian, an obstacle, a signal, a traffic sign, a lane, or the like.

Incidentally, Fig. 8 depicts an example of photographing ranges of the respective imaging sections 7910, 7912, 7914, and 7916. An imaging range a represents the imaging range of the imaging section 7910 provided to the front nose. Imaging ranges b and c respectively represent the imaging ranges of the imaging sections 7912 and 7914 provided to the sideview mirrors. An imaging range d represents the imaging range of the imaging section 7916 provided to the rear bumper or the back door. A bird's-eye image of the vehicle 7900 as viewed from above can be obtained by superimposing image data imaged by the imaging sections 7910, 7912, 7914, and 7916, for example.

[Outside-vehicle information detecting sections 7920, 7922, 7924, 7926, 7928, and 7930 provided to the front, rear, sides, and corners of the vehicle 7900 and the upper portion of the windshield within the interior of the vehicle may be, for example, an ultrasonic sensor or a radar device. The outside- vehicle information detecting sections 7920, 7926, and 7930 provided to the front nose of the vehicle 7900, the rear bumper, the back door of the vehicle 7900, and the upper portion of the windshield within the interior of the vehicle may be a LIDAR device, for example. This outside- vehicle information detecting sections 7920 to 7930 are used mainly to detect a preceding vehicle, a pedestrian, an obstacle, or the like.

Returning to Fig. 7, the description will be continued. The outside-vehicle information detecting unit 7400 makes the imaging section 7410 image an image of the outside of the vehicle and receives imaged image data (for example depth data). In addition, the outside- vehicle information detecting unit 7400 receives detection information from the outside-vehicle information detecting section 7420 connected to the outside-vehicle information detecting unit 7400. In a case where the outside- vehicle information detecting section 7420 is an ultrasonic sensor, a radar device, or a LIDAR (or a ToF Sensor 101) device, the outside-vehicle information detecting unit 7400 transmits an ultrasonic wave, an electromagnetic wave, or the like, and receives information of a received reflected wave.

On the basis of the received information, the outside- vehicle information detecting unit 7400 may perform processing of detecting an object such as a human, a vehicle, an obstacle, a sign, a character on a road surface, or the like, or processing of detecting a distance thereto. The outside-vehicle information detecting unit 7400 may perform environment recognition processing of recognizing a rainfall, a fog, road surface conditions, or the like on the basis of the received information. The outside-vehicle information detecting unit 7400 may calculate a distance to an object outside the vehicle on the basis of the received information. In addition, on the basis of the received image data, the outside-vehicle information detecting unit 7400 may perform image recognition processing of recognizing a human, a vehicle, an obstacle, a sign, a character on a road surface, or the like, or processing of detecting a distance thereto. The outside-vehicle information detecting unit 7400 may subject the received image data to processing such as distortion correction, alignment, or the like, and combine the image data imaged by a plurality of different imaging sections 7410 to generate a bird's-eye image or a panoramic image.

The outside-vehicle information detecting unit 7400 may perform viewpoint conversion processing using the image data imaged by the imaging section 7410 including the different imaging parts.

The in-vehicle information detecting unit 7500 detects information about the inside of the vehicle. The in-vehicle information detecting unit 7500 is, for example, connected with a driver state detecting section 7510 that detects the state of a driver (for example for driver drowsiness detection). The driver state detecting section 7510 may include a camera that images the driver, a biosensor that detects biological information of the driver, a microphone that collects sound within the interior of the vehicle, or the like. The biosensor is, for example, disposed in a seat surface, the steering wheel, or the like, and detects biological information of an occupant sitting in a seat or the driver holding the steering wheel. On the basis of detection information input from the driver state detecting section 7510, the in-vehicle information detecting unit 7500 may calculate a degree of fatigue of the driver or a degree of concentration of the driver or may determine whether the driver is dozing. The in-vehicle information detecting unit 7500 may subject an audio signal obtained by the collection of the sound to processing such as noise canceling processing or the like.

The integrated control unit 7600 controls general operation within the vehicle control system 7000 in accordance with various kinds of programs. The integrated control unit 7600 is connected with an input section 7800. The input section 7800 is implemented by a device capable of input operation by an occupant, such, for example, as a touch panel, a button, a microphone, a switch, a lever, or the like. The integrated control unit 7600 may be supplied with data obtained by voice recognition of voice input through the microphone. The input section 7800 may, for example, be a remote control device using infrared rays or other radio waves, or an external connecting device such as a mobile telephone, a personal digital assistant (PDA), or the like that supports operation of the vehicle control system 7000. The input section 7800 may be, for example, a camera. In that case, an occupant can input information by gesture. Alternatively, data may be input which is obtained by detecting the movement of a wearable device that an occupant wears. Further, the input section 7800 may, for example, include an input control circuit or the like that generates an input signal on the basis of information input by an occupant or the like using the above-described input section 7800, and which outputs the generated input signal to the integrated control unit 7600. An occupant or the like inputs various kinds of data or gives an instruction for processing operation to the vehicle control system 7000 by operating the input section 7800.

The storage section 7690 may include a read only memory (for example NVM 106) that stores various kinds of programs executed by the microcomputer and a random access memory (for example SDRAM 104) that stores various kinds of parameters, operation results, sensor values, or the like. In addition, the storage section 7690 may be implemented by a magnetic storage device such as a hard disc drive (HDD) or the like, a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like.

The general-purpose communication I/F 7620 is a communication I/F used widely, which communication I/F mediates communication with various apparatuses present in an external environment 7750. The general-purpose communication I/F 7620 may implement a cellular communication protocol such as global system for mobile communications (GSM (registered trademark)), worldwide interoperability for microwave access (WiMAX (registered trademark)), long term evolution (LTE (registered trademark)), LTE-advanced (LTE-A), or the like, or another wireless communication protocol such as wireless LAN (referred to also as wireless fidelity (Wi-Fi (registered trademark)), Bluetooth (registered trademark), or the like. The general-purpose communication I/F 7620 may, for example, connect to an apparatus (for example, an application server or a control server) present on an external network (for example, the Internet, a cloud network, or a company-specific network) via a base station or an access point. In addition, the general-purpose communication I/F 7620 may connect to a terminal present in the vicinity of the vehicle (which terminal is, for example, a terminal of the driver, a pedestrian, or a store, or a machine type communication (MTC) terminal) using a peer to peer (P2P) technology, for example.

The dedicated communication I/F 7630 is a communication I/F that supports a communication protocol developed for use in vehicles. The dedicated communication I/F 7630 may implement a standard protocol such, for example, as wireless access in vehicle environment (WAVE), which is a combination of institute of electrical and electronic engineers (IEEE) 802.1 lp as a lower layer and IEEE 1609 as a higher layer, dedicated short range communications (DSRC), or a cellular communication protocol. The dedicated communication I/F 7630 typically carries out V2X communication as a concept including one or more of communication between a vehicle and a vehicle (Vehicle to Vehicle), communication between a road and a vehicle (Vehicle to Infrastructure), communication between a vehicle and a home (Vehicle to Home), and communication between a pedestrian and a vehicle (Vehicle to Pedestrian).

The positioning section 7640, for example, performs positioning by receiving a global navigation satellite system (GNSS) signal from a GNSS satellite (for example, a GPS signal from a global positioning system (GPS) satellite), and generates positional information including the latitude, longitude, and altitude of the vehicle. Incidentally, the positioning section 7640 may identify a current position by exchanging signals with a wireless access point, or may obtain the positional information from a terminal such as a mobile telephone, a personal handyphone system (PHS), or a smart phone that has a positioning function.

The beacon receiving section 7650, for example, receives a radio wave or an electromagnetic wave transmitted from a radio station installed on a road or the like, and thereby obtains information about the current position, congestion, a closed road, a necessary time, or the like. Incidentally, the function of the beacon receiving section 7650 may be included in the dedicated communication I/F 7630 described above.

The in-vehicle device I/F 7660 is a communication interface that mediates connection between the microcomputer 7610 and various in-vehicle devices 7760 present within the vehicle. The in-vehicle device I/F 7660 may establish wireless connection using a wireless communication protocol such as wireless LAN, Bluetooth (registered trademark), near field communication (NFC), or wireless universal serial bus (WUSB). In addition, the in-vehicle device I/F 7660 may establish wired connection by universal serial bus (USB), high-definition multimedia interface (HDMI (registered trademark)), mobile high-definition link (MHL), or the like via a connection terminal (and a cable if necessary) not depicted in the figures. The in-vehicle devices 7760 may, for example, include at least one of a mobile device and a wearable device possessed by an occupant and an information device carried into or attached to the vehicle. The in-vehicle devices 7760 may also include a navigation device that searches for a path to an arbitrary destination. The in-vehicle device I/F 7660 exchanges control signals or data signals with these in-vehicle devices 7760.

The vehicle-mounted network I/F 7680 is an interface that mediates communication between the microcomputer 7610 and the communication network 7010. The vehicle-mounted network I/F 7680 transmits and receives signals or the like in conformity with a predetermined protocol supported by the communication network 7010.

The microcomputer 7610 (for example the host processor 103) of the integrated control unit 7600 controls the vehicle control system 7000 in accordance with various kinds of programs on the basis of information obtained via at least one of the general-purpose communication I/F 7620, the dedicated communication I/F 7630, the positioning section 7640, the beacon receiving section 7650, the in-vehicle device I/F 7660, and the vehicle-mounted network I/F 7680. For example, the microcomputer 7610 may calculate a control target value for the driving force generating device, the steering mechanism, or the braking device on the basis of the obtained information about the inside and outside of the vehicle, and output a control command to the driving system control unit 7100. For example, the microcomputer 7610 may perform cooperative control intended to implement functions of an advanced driver assistance system (ADAS) which functions include collision avoidance or shock mitigation for the vehicle, following driving based on a following distance, vehicle speed maintaining driving, a warning of collision of the vehicle, a warning of deviation of the vehicle from a lane, or the like. In addition, the microcomputer 7610 may perform cooperative control intended for automatic driving, which makes the vehicle to travel autonomously without depending on the operation of the driver, or the like, by controlling the driving force generating device, the steering mechanism, the braking device, or the like on the basis of the obtained information about the surroundings of the vehicle.

The microcomputer 7610 may generate three-dimensional distance information between the vehicle and an object such as a surrounding structure, a person, or the like, and generate local map information including information about the surroundings of the current position of the vehicle, on the basis of information obtained via at least one of the general-purpose communication I/F 7620, the dedicated communication I/F 7630, the positioning section 7640, the beacon receiving section 7650, the in-vehicle device I/F 7660, and the vehicle-mounted network I/F 7680. In addition, the microcomputer 7610 may predict danger such as collision of the vehicle, approaching of a pedestrian or the like, an entry to a closed road, or the like on the basis of the obtained information, and generate a warning signal. The warning signal may, for example, be a signal for producing a warning sound or lighting a warning lamp.

The sound/image output section 7670 transmits an output signal of at least one of a sound and an image to an output device capable of visually or auditorily notifying information to an occupant of the vehicle or the outside of the vehicle. In the example of FIG. 7, an audio speaker 7710, a display section 7720, and an instrument panel 7730 are illustrated as the output device. The display section 7720 may, for example, include at least one of an on-board display and a head-up display. The display section 7720 may have an augmented reality (AR) display function. The output device may be other than these devices, and may be another device such as headphones, a wearable device such as an eyeglass type display worn by an occupant or the like, a projector, a lamp, or the like. In a case where the output device is a display device, the display device visually displays results obtained by various kinds of processing performed by the microcomputer 7610 or information received from another control unit in various forms such as text, an image, a table, a graph, or the like. In addition, in a case where the output device is an audio output device, the audio output device converts an audio signal constituted of reproduced audio data or sound data or the like into an analog signal, and auditorily outputs the analog signal. Incidentally, at least two control units connected to each other via the communication network 7010 in the example depicted in FIG. 7 may be integrated into one control unit. Alternatively, each individual control unit may include a plurality of control units. Further, the vehicle control system 7000 may include another control unit not depicted in the figures. In addition, part or the whole of the functions performed by one of the control units in the above description may be assigned to another control unit. That is, predetermined arithmetic processing may be performed by any of the control units as long as information is transmitted and received via the communication network 7010. Similarly, a sensor or a device connected to one of the control units may be connected to another control unit, and a plurality of control units may mutually transmit and receive detection information via the communication network 7010.

It should be recognized that the embodiments describe methods with an exemplary ordering of method steps. The specific ordering of method steps is, however, given for illustrative purposes only and should not be construed as binding.

It should also be noted that the division of the electronic device of Fig. 1 into units is only made for illustration purposes and that the present disclosure is not limited to any specific division of functions in specific units.

All units and entities described in this specification and claimed in the appended claims can, if not stated otherwise, be implemented as integrated circuit logic, for example, on a chip, and functionality provided by such units and entities can, if not stated otherwise, be implemented by software.

In so far as the embodiments of the disclosure described above are implemented, at least in part, using software-controlled data processing apparatus, it will be appreciated that a computer program providing such software control and a transmission, storage or other medium by which such a computer program is provided are envisaged as aspects of the present disclosure.

Note that the present technology can also be configured as described below:

(1) An electronic device comprising circuitry configured to obtain depth data; perform (504) classification of the depth data by an artificial neural network, ANN, to determine classification information for the depth data, perform an intended functionality if a primary criterion or secondary criterion is fulfilled; determine that the primary criterion for performing the intended functionality is fulfilled based on the classification information obtained by the ANN; determine that the secondary criterion for performing the intended functionality is fulfilled based on a secondary mechanism; determine (510), based on the secondary mechanism and the classification information obtained by the ANN, if the classification of the depth data is false negative. (2) The electronic device of (1), wherein the circuitry is configured to re-train the ANN if the depth data is classified as false negative.

(3) The electronic device (103) of (1) or (2), wherein the circuitry is configured to store (502) the obtained depth data on a volatile memory (104).

(4) The electronic device (103) of anyone of (1) to (3), wherein the circuitry is configured to store (511) the depth data on a non-volatile memory (106) if the depth data is classified as false negative.

(5) The electronic device (103) of anyone of (1) to (4), wherein the circuitry is configured to determine (510) that the classification of the depth data is false negative if the circuitry determined that the secondary criterion for performing the intended functionality is fulfilled based on the secondary mechanism within a predetermined time span (Tsw) after the circuitry determined that the primary criterion for performing the intended functionality is not fulfilled based on the classification information.

(6) The electronic device (103) of anyone of (1) to (5), wherein the classification information comprises a confidence value related to the intended functionality, and wherein the primary criterion for performing the intended functionality is fulfilled if the confidence value is above a predetermined acceptance threshold (Thr_Acc).

(7) The electronic device (103) of (6), wherein the circuitry is configured to keep the obtained depth data in the volatile memory (104) for a predetermined time span (Tsw) after performing classification on the depth data by the ANN if the confidence value of the depth data for the intended class is below the predetermined acceptance threshold (Thr_Acc) and above a predetermined monitoring threshold (Thr_Mon).

(8) The electronic device (103) of anyone of (1) to (7), wherein the circuitry is configured to store a label indicating a class related to the intended functionality, together with the corresponding depth data as labeled depth data on the non-volatile memory (106).

(9) The electronic device (103) of (8), wherein the circuitry is configured to re-train the ANN based on the labeled depth data stored on the non-volatile memory (106).

(10) The electronic device (103) of anyone of (1) to (9), wherein the depth data is based on an image generated by a Time-of-Flight sensor (101).

(11) The electronic device (103) of anyone of (1) to (10), wherein the intended functionality relates to an automotive environment. (12) The electronic device (103) of anyone of (1) to (11), wherein the ANN is configured to perform face identification, and wherein the intended functionality relates to opening a door lock of a car, and wherein the secondary criterion for performing the intended functionality is a criterion for opening the door lock.

(13) The electronic device (103) of (12), wherein the circuitry is configured to determine that, with respect to the secondary mechanism, the secondary criterion for opening the door lock is fulfilled if the a key is used to manually unlock the car door.

(14) The electronic device (103) of anyone (1) to (13), wherein the ANN is configured to perform gesture recognition, and wherein the intended functionality relates to increasing a volume of an audio system in a car, and wherein the secondary criterion for performing the intended functionality is a secondary criterion for increasing the volume.

(15) The electronic device (103) of (14), wherein the circuitry is configured to determine that, with respect to the secondary mechanism, the secondary criterion for increasing the volume is fulfilled if the a knob or touch screen on the car’s console is operated manually to increase the volume.

(16) The electronic device (103) of anyone (1) to (15), wherein the ANN is a convolutional neural network.

(17) The electronic device (103) of anyone of (1) to (16), wherein the ANN is implemented as a software or as a hardware.

(18) A method comprising: obtaining depth data and storing (502) the depth data on a volatile memory (104); performing (504) classification of the depth data by an artificial neural network, ANN, to determine classification information for the depth data, performing an intended functionality if a primary criterion or a secondary criterion is fulfilled; determining that the primary criterion for performing the intended functionality is fulfilled based on the classification information obtained by the ANN; determining that the secondary criterion for performing the intended functionality is fulfilled based on a secondary mechanism; determining (510), based on the secondary mechanism and the classification information obtained by the ANN, if the classification of the depth data is false negative.

Claims

1. An electronic device comprising circuitry configured to obtain depth data; perform classification of the depth data by an artificial neural network, ANN, to determine classification information for the depth data, perform an intended functionality if a primary criterion or secondary criterion is fulfilled; determine that the primary criterion for performing the intended functionality is fulfilled based on the classification information obtained by the ANN; determine that the secondary criterion for performing the intended functionality is fulfilled based on a secondary mechanism; determine, based on the secondary mechanism and the classification information obtained by the ANN, if the classification of the depth data is false negative.

2. The electronic device of claim 1, wherein the circuitry is configured to re-train the ANN if the depth data is classified as false negative.

3. The electronic device of claim 1, wherein the circuitry is configured to store the obtained depth data on a volatile memory.

4. The electronic device of claim 1, wherein the circuitry is configured to store the depth data on a non-volatile memory if the depth data is classified as false negative.

5. The electronic device of claim 1 wherein the circuitry is configured to determine that the classification of the depth data is false negative if the circuitry determined that the secondary criterion for performing the intended functionality is fulfilled based on the secondary mechanism within a predetermined time span after the circuitry determined that the primary criterion for performing the intended functionality is not fulfilled based on the classification information.

6. The electronic device of claim 1 wherein the classification information comprises a confidence value related to the intended functionality, and wherein the primary criterion for performing the intended functionality is fulfilled if the confidence value is above a predetermined acceptance threshold.

7. The electronic device of claim 6, wherein the circuitry is configured to keep the obtained depth data in the volatile memory for a predetermined time span after performing classification on the depth data by the ANN if the confidence value of the depth data for the intended class is below the predetermined acceptance threshold and above a predetermined monitoring threshold.

8. The electronic device of claim 1, wherein the circuitry is configured to store a label indicating a class related to the intended functionality, together with the corresponding depth data as labeled depth data on the non-volatile memory.

9. The electronic device of claim 8, wherein the circuitry is configured to re-train the ANN based on the labeled depth data stored on the non-volatile memory.

10. The electronic device of claim 1, wherein the depth data is based on an image generated by a Time-of-Flight sensor.

11. The electronic device of claim 1, wherein the intended functionality relates to an automotive environment.

12. The electronic device of claim 1, wherein the ANN is configured to perform face identification, and wherein the intended functionality relates to opening a door lock of a car, and wherein the secondary criterion for performing the intended functionality is a criterion for opening the door lock.

13. The electronic device of claim 12, wherein the circuitry is configured to determine that, with respect to the secondary mechanism, the secondary criterion for opening the door lock is fulfilled if the a key is used to manually unlock the car door.

14. The electronic device of claim 1, wherein the ANN is configured to perform gesture recognition, and wherein the intended functionality relates to increasing a volume of an audio system in a car, and wherein the secondary criterion for performing the intended functionality is a secondary criterion for increasing the volume.

15. The electronic device of claim 14, wherein the circuitry is configured to determine that, with respect to the secondary mechanism, the secondary criterion for increasing the volume is fulfilled if the a knob or touch screen on the car’s console is operated manually to increase the volume.

16. The electronic device of claim 1, wherein the ANN is a convolutional neural network.

17. The electronic device of claim 1, wherein the ANN is implemented as a software or as a hardware.

18. A method comprising: obtaining depth data and storing the depth data on a volatile memory; performing classification of the depth data by an artificial neural network, ANN, to determine classification information for the depth data, performing an intended functionality if a primary criterion or a secondary criterion is fulfilled; determining that the primary criterion for performing the intended functionality is fulfilled based on the classification information obtained by the ANN; determining that the secondary criterion for performing the intended functionality is fulfilled based on a secondary mechanism; determining, based on the secondary mechanism and the classification information obtained by the ANN, if the classification of the depth data is false negative.