GB2523632A

GB2523632A - System for storing and analysing automotive data

Info

Publication number: GB2523632A
Application number: GB1422757.3A
Authority: GB
Inventors: Gautham Raju; Kirankumar Reddy; Mayank Jalan; Vishal Ratra
Original assignee: Daimler AG
Current assignee: Mercedes Benz Group AG
Priority date: 2014-12-19
Filing date: 2014-12-19
Publication date: 2015-09-02

Abstract

System (10) for storing and analysing automotive data (12), comprising an ETL server (16) for receiving the automotive data (12) from at least one data source (14) such as a test vehicle, dynamometer, diagnostic test set up or simulation tool as a function of a common reference basis such as a function of time, wherein the ETL server (16) is configured to combine subsets (46) of the automotive data (12) from a predefined interval of the common reference basis into data blocks (18) and the system (10) comprises a computer cluster (22) for storing the data blocks (18) at respective storage places and a relational database management system (24) for storing meta data (26) identifying the respective storage place of each data block (18).

Description

System for storing and analysing automotive data The invention relates to a system for storing and analysing automotive data, comprising an ETL server (ETL-"Extract, Transform and Load") for receiving the automotive data from at least one data source, wherein the automotive data comprise signal values as a function of a common reference basis. In addition, the invention relates to a network.

Data logging within the automotive development chain is an important integral part. With the advent of communication and hardware technologies, the volume of data logged and stored from automobiles is increasing exponentially. The data typically is logged from CAN busses (Controller Area Network), Flexray networks as well as from internal ECU (Engine Control Unit) variables in case of development cars. The great volume of measured data extremely varies in its form and structure. Data can be structured, semi-structured and highly unstructured.

The US 2014/0040575 Al shows techniques for mobile clusters for collecting telemetry data and processing analytic tasks.

It is the object of the present invention to provide a solution to efficiently structure and store a large amount of automotive data and to quickly analyse the stored automotive data.

According to the invention, this object is solved by a system as well as a network having the features according to the respective independent claims. Advantageous implementations of the invention are the subject matter of the dependent claims, of the

description and of the figures.

A system for storing and analysing automotive data according to the invention comprises an ETL server for receiving the automotive data from at least one data source as a function of a common reference basis. Moreover, the ETL server is configured to combine subsets of the automotive data from a predefined interval of the common reference basis into data blocks and the system comprises a computer cluster for storing the data blocks at respective storage places and a relational database management system for storing meta data identifying the respective storage place of each data block.

The ETL server is preferably configured to execute a so-called "Extract, Transform and Load-process. During the "Extract"-step automotive data are extracted from homogeneous or heterogeneous data sources. The automotive data can be available in different formats, but they are all based on the same reference basis. For example, the reference basis can be a time signal. During the "Transform"-step the extracted automotive data are all transformed for storing the automotive data in a proper format or structure for querying and analysis purpose. According to the invention the automotive data are transformed by splitting the automotive data into subsets, wherein each subset refers to a predefined interval of the common reference basis. For instance, if automotive data originate from different data sources, automotive data from each of the data sources can be allocated to each value of the common reference basis within one predefined interval. Thus, the automotive data are sorted according to the common reference basis.

The subsets are apportioned to data blocks. Advantageously, the data are stored using fixed point arithmetic. During the "Load-step the automotive data and the data blocks respectively are loaded into a final target. According to the invention, the data blocks are distributed to computers of a computer cluster, wherein each computer of the computer cluster represents a data node. Simultaneously, meta data are stored in a relational data base management system. The meta data may concern information about the storage place, viz, the respective data nodes, and optionally about the content of the data blocks.

The meta data allow for a fast access to the corresponding data without searching all the computers of the cluster for the automotive data. The system according to the invention is very cost efficient, because it works well with commodity hardware. Due to the block format of the data blocks, the system is capable of a massive parallel computing, that allows a fast computation.

Preferably, the automotive data are provided as a function of time as the common reference basis. Many of the data being logged in automobiles are available in time-series format with different sampling rates and multiple time references. The automotive data irrespective of the corresponding data source can be disposed in time intervals. Within one time interval each time step can be allocated at least one data value from at least one data source. So, the system is highly structured.

Advantageously, the at least one data source is a test-vehicle and/or a dynamometer and/or a diagnostic test setup and/or a simulation tool. Those data sources can provide data in different formats. Due to the invention, all data can be sorted according to the common reference basis. Thus, a person, who wants to analyse the data, can obtain the automotive data from all data sources corresponding to one value of the common reference basis, e.g. corresponding to one time step. Hence, the system allows an extensive and reliable analysis of the developed automobiles.

It can be provided that the computer cluster is built up as a Hadoop Distributed File System. Hadoop is an open-source software for reliable, scalable and distributed computing. The Hadoop Distributed File System according to the invention can handle time-dependency and correlations among signals in an advantageous manner.

Techniques for clusters for processing analytic tasks by the help of Hadoop are discloses in the US 2014/0040575 Al.

In addition, a network comprising at least one system according to the invention is associated with the invention. The systems can be spatially distributed and used e.g. by different departments of a company. For instance, each department using the system according to the invention can perform different tests and thus collect automotive data being stored in the advantageous manner described above. Therefore each department, and thus each of the systems, of the company can be based e.g. in another country.

It proves advantageous if the network comprises a central analysis computer connected to each of the at least one system, wherein the central analysis computer is configured to receive a user-defined specific request concerning a specific part of the automotive data, the part being defined by a search request pattern and to transmit the specific request to each of the at least one system. The user-defined request can be a request or a task defined by a user and is independent on a position of the user. The request relates to a specified part of automotive data, for example automotive data of a certain time step or a certain data source and/or to a calculation specification that should be disposed to a specified part of the automotive data. The central analysis computer is configured to send the request concerning the specified part of the automotive data to all of the systems connected to the network.

Particularly preferably, each of the at least one system is configured to execute the request, to provide a result of the specific request and to transmit the result to the central analysis computer. Transferring all test drives data to the central analysis computer would be very slow. Thus, according to the invention, the systems are configured to locally execute the request in order to provide the specified part of the automotive data as a result of the user-defined request. Only the results of the request are transmitted to the centralized analysis computer. The central analysis computer is configured to execute a real time analysis due to the fast and efficient transmission of the results.

In an advantageous development, the at least one system is configured to execute the specific request by using a MapReduce algorithm. A MapReduce algorithm, which can be locally executed within each system, reduces the automotive data of each data node to a final output. Using the MapReduce algorithm, a parallel and fast computation of data can be achieved and less data need to be transferred on the network.

The preferred embodiments presented with respect to the system according to the invention and the advantages thereof correspondingly apply to the network according to the invention.

Further features of the invention are apparent from the claims, the figures and the description of figures. All of the features and feature combinations mentioned above in the description as well as the features and feature combinations mentioned below in the description of figures and/or shown in the figures alone are usable not only in the respectively specified combination, but also in other combinations or else alone.

In the following, the invention is explained in more detail based on a preferred embodiment as well as with reference to the attached drawings.

There show: Fig. 1 a schematic illustration of a system according to the invention; Fig. 2 a schematic illustration of an extract, transform and load process (ETL) of a ETL server; Fig. 3 a schematic illustration of a block design for a Hadoop Distributed File System; Fig. 4 a schematic illustration of a network according to the invention; and Fig. 5 a schematic illustration of a MapReduce algorithm applied to a system according to the invention.

Fig.1 shows a system 10 for storing and analysing automotive data 12. The system 10 is a so-called offline architecture for storage and analytics of automotive measurement data 12. The automotive data 12 can be provided from different data sources 14. For instance, the automotive data 12 can be provided by a vehicle bus of a test vehicle, from a dynamometer, from test drives or from diagnostic tools. The automotive data 12 can be available in a time-series format as a common reference basis with different sampling rates and multiple time rates and in different formats, e.g. a measurement data format like mdf or diadem.

The automotive data 12 can transferred to an ETL server 16 of the system 10. The ETL server 16 is configured to apportion the automotive data 12 to data blocks 18. Each of the data blocks 18 contains a subset of the automotive data 12 from a specified time interval.

The data blocks 18 are distributed to data nodes 20 of a computer cluster 22. The computer cluster 22 can be built up as a Hadoop Distributed File System. The system 10 also includes a relational database management system 24 for storing meta data 26 provided by the ETL server 16. The meta data 26 contain information about the respective data node 20, viz, the storage place, of each data block 18 as well as information about the content of each data block 18.

In order to analyze the automotive data 12 stored as data blocks 18 on the data nodes 20 of the computer cluster 22, a user 28 can define a request 30 or a job concerning a specified part of the automotive data 12. The request 30 can by transmitted via a web server 32. The web server 32 is configured to communicate with the computer cluster 22 and the relational database management system 24 respectively. The relational database management system 24 allows a fast access to the specified part of the automotive data 12 within the computer cluster 22. The specified part of the automotive data 12 is provided as a result 34 or final output and is transferred to the web server 32. The result 34 can be used for large scale analytics 36.

Following user-defined request 30 shall exemplify the necessity for an efficient storage and analytics of automotive measurement data 12. The example relates to a function and control strategy for a validation of a battery management system (BMS). The developers of the battery management system want to know, if they need to change the software of the battery management system. Therefore, they analyze, if the battery voltage behaves normal during low state-of-charge under cold conditions, developers get to know if they need to change the software.

The request contains e.g. the following criteria: PNHV_CelLVolt_Max >= PN HV_CelLVolt_Max_Limit PNHV_Bat_Curr 0 Veh_Spd >= 0 PNHV_Bat_SOC <= 20 FNHV_Bat_Temp <0°C Car-Id = 2xx-xxx Duration= complete life cycle of car, wherein PNHV_CelLVolt_Max (2ms) is the maximum cell voltage of a battery at a certain time step, PNHV_CelLVolt_Max_Limit (2ms) is the limit for cell voltage at a certain time step, PNHV_Bat_Curr (ims) is a current signal of the battery at a certain time step, PNHV Bat SOC (lOms) is the state of charge signal for the battery at a certain time step, PNHV_Bat_Temp (2Oms) is the temperature of the battery at a certain time step and Veh_Spd (lOOms) is the vehicle speed at a certain time step.

During a complete life cycle of a car, a huge amount of data is generated, which need to be handled. The data are collected from test-vehicles, dynamometers, diagnostic tests, simulations and many other systems. The data analysis has to be performed on all of the data.

By means of the system according to the invention a cheap and reliable solution for performing data analysis tasks, and thus a fast computation, is achieved. The system is scalable and composes a common framework for different tasks.

Fig. 2 shows a schematic illustration of the "Extract, Transform and Load-process (ETL) executed by the ETL server 16. Within a first step S21 a measurement file is provided.

The measurement file comprises the automotive data 12 as input or raw data. The automotive data 12 can be available in different formats. Within a second step 522 the automotive data 12 are prepared. Within a third step 323 the prepared data are stored in a Hadoop Distributed File System 38 and in a relational database management system 24.

For storing the prepared data in the Hadoop Distributed File System 38, the prepared data are split into blocks 18 based on time range as common reference basis and size.

Thus, each block 18 contains all signals in the same time range. Hence, it is possible to handle correlation among the signals. The data can be stored in fixed point arithmetic and thus comprise the same format as in the ECU. A block header is added to each block 18.

Storing the address of the data or signals in the block header allows loading of only required signals from the block 18. Each block 18 is stored in a data node 20.

In order to allow a fast access to the automotive data 12 of each data block 18, file and signal information 40 is extracted from the data prepared within step 322. Furthermore, signal statistics 42 can be computed from the file and signal information 40 and stored as the meta data 26 in the relational database management system 24.

A structure of a data block 18 is exemplarily shown in Fig. 3. Each data block 18 comprises a block header 44 and a subset 46 of automotive data 12. The block header 44 comprises a length 48 of the block header 44 and the relative address 50 of the automotive data 12 within the data block 18. The subset 46 of automotive data 12 is sorted according to the time steps t_1 and t_2. Signals sig_1 1 and sig_12 are allocated to the time step t_1, wherein signal sig_li can be provided from a first data source and signal sig_12 can be provided from a second data source. A signal sig_21 is allocated to the time step t_2, wherein the signal sig_21 can be provided from the first data source.

There is also shown a signal sig_nm. The signal sig_nm corresponds to a time step n and can be provided from a data source m. Each of the signals sig_li, sig_12, sig_21, sig_nm comprises values vaLl, val_2, val_n.

Fig. 4 shows a schematic illustration of a network 52 comprising four systems 10, wherein the systems 10 are placed in different departments 54. Here, the departments 54 may be spread all over the world, which is visualized by the shapes of the departments 54 shown, wherein each shape belongs to a specific country. Here, a first department 54 is based in Germany, a second department 54 is based in Spain, a third department 54 is based in South Africa and a fourth department 54 is based in Sweden.

Within each department 54 automotive data 12 from data sources 14 are stored in the system 10 as described above. The systems 10 of the network 53 are all connected to a central analysis computer 56. A user 28 sitting anywhere can ask for a specific analysis immediately by defining a request 30 concerning a special part of the automotive data 12.

Transferring all the data 12 from each system 10 to the central analysis computer 56 Thus, the request 30 is transferred to each system 10 in each department 54 of the network 52 by the central analysis computer 56. The request 30 is locally executed at each system 10. Each system 10 provides results 34. Only the results 34 of each system are sent back to the central analysis computer 56, where a real time analysis 58 of the results 34 can be executed. Due to the local computation elaborate analysis, like iterative analysis to find an exact solution or various analyses, can be performed. The local computation and the real time analysis 58 also allow fast iterative test drives with improvements, e.g. updates in software, or with new requirements to verify and validate the updates.

Summing up, various analyses can locally be performed on test drive data and only results 34 can be transferred to concerned department. Hence, a huge amount of data 12 can be handled.

Fig. 5 shows a schematic illustration of a MapReduce algorithm applied to a system 10 in order to proved results 34 according to a user-defined request 30.

Within a first step 551 a user starts jobs or requests 30 through a Web Server Interface.

Within a second step S52 only the data blocks 18 relevant to the job or request 30 are processed locally on data nodes 20. Within a third step 553, the block header 44 is loaded and parsed to find the address of the required signals 60. This process is called Map'. Within a fourth step 554, the required signals 60 are loaded and a computation defined by the request 30 is performed. This is also part of the Map"-process. Within a fifth step S55, intermediate output 62 for each block 18 is generated. This is also part of the "Map"-process. Within a sixth step S56 the intermediate output 62 is combined. This is part of the "Reduce"-process. Within a seventh step 557, the final output and the result 34 respectively is provided.

List of reference signs system 12 automotive data 14 data source 16 ETL server 18 data block data node 22 computer cluster 24 relational database management system 26 meta data 28 user request 32 web server 34 result 36 large scale analytics 38 Hadoop Distributed File System file and signal information 42 signal statistics 44 block header 46 subset 48 length of block header relative address 52 network 54 department 56 central analysis computer 58 real time analysis required signal 62 intermediate output t_1,t_2 timesteps sig_li, sig_12, sig_21, signals sig_nm S21, S22, S23, S51-S57 steps

Claims

Claims System (10) for storing and analysing automotive data (12), comprising an ETL server (16) for receiving the automotive data (12) from at least one data source (14) as a function of a common reference basis, characterized in that the ETL server (16)is configured to combine subsets (46) of the automotive data (12) from a predefined interval of the common reference basis into data blocks (18) and the system (10) comprises a computer cluster (22) for storing the data blocks (18) at respective storage places and a relational database management system (24) for storing meta data (26) identifying the respective storage place of each data block (18).
2. System (10) according to claim 1, characterized in that the automotive data (12) are provided as a function of time (t_1, t_2) as a common reference basis.
3. System (10) according to claim 1 or2, characterized in that the at least one data source (14) is a test-vehicle and/or a dynamometer and/or a diagnostic test setup and/or a simulation tool.
4. System (10) according to any one of the preceding claims, characterized in that the computer cluster (22) is built up as a Hadoop Distributed File System (38).
5. Network (52) comprising at least one system (10) according to any one of the preceding claims.
6. Network (52) according to claim 5, characterized in that the network (52) comprises a central analysis computer (56) connected to each of the at least one system (10), wherein the central analysis computer (56) is configured to receive a user-defined specific request (30) concerning a specific part of the automotive data (12), the part being defined by a search request pattern, and to transmit the specific request (30) to each of the at least one system (10).
7. Network (52) according to claim 6, characterized in that each of the at least one system (10) is configured to execute the request (30), to provide a result (34) of the specific request (30) and to transmit the result (34) to the central analysis computer (56).
8. Network (54) according to claim 7, characterized in that the at least one system (10) is configured to execute the specific request (30) by using a Mapkeduce algorithm.