WO2017190757A1 - Système et procédé d'analyse de données réparties - Google Patents
Système et procédé d'analyse de données réparties Download PDFInfo
- Publication number
- WO2017190757A1 WO2017190757A1 PCT/EP2016/000713 EP2016000713W WO2017190757A1 WO 2017190757 A1 WO2017190757 A1 WO 2017190757A1 EP 2016000713 W EP2016000713 W EP 2016000713W WO 2017190757 A1 WO2017190757 A1 WO 2017190757A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- measurement data
- storage device
- analysis
- computing
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
Definitions
- a distributed data analysis system for analyzing collected measurement data comprises a data input device configured to receive measurement data, a first storage device associated with the data input device and configured to store the measurement data input via the data input unit, and a first computing device associated with the first storage device.
- a second storage device is configured to store measurement data previously stored on the first storage device.
- a second computing device is associated with the second storage device, and a data distribution system is configured to distribute the measurement data between the first storage device and the second storage device based on at least one predetermined criterion.
- a data management device is configured to store a location of the measurement data and to update the stored location based on the distribution by the data distribution device.
- Fig. 3 shows a schematic overview of a distributed data analysis system in accordance with another embodiment of the present invention.
- Each service point 104 includes a data input device 10 for receiving the measurement data generated during the test drive.
- the amount of data typically will be several GB or several tens of GB.
- Each service point 104 includes a storage device 12, for example, one or more hard disks for storing the
- Main computing system 102 includes a main storage device 16 and an associated main computing device 18.
- Storage device 16 has a much larger storage capacity than, for example, storage devices 12, 13.
- storage device 16 serves as a "data lake” storing the bulk of the available measurement data collected by one or more vehicles.
- storage devices 13, 16 and computing devices 15, 18 may be part of a large computing cluster 31 , for example, a Hadoop cluster.
- the measurement data offloaded at service point 106 can be easily transferred to main storage device 16 due to service point 106 being co- located with main computing system 102.
- service point 104 which may be located anywhere around the globe, for example, in areas with poor internet connection. Therefore, the data offloaded at service point 104 may not be available for analysis on main computing system 102 in a timely manner. Therefore, in the present embodiment, part of the analysis of the measurement data is performed at local service point 104, in combination with an analysis that is performed in parallel on the measurement data available at the headquarters. This will be described in more detail below.
- distributed data analysis system 100 includes a data management device 22 that is in communication with computing device 14 of each service point 104, as well as with main computing system 102.
- data management device 22 may include a web server running the DaSense software.
- Data management device 22 is configured to store a location of the measurement data that is offloaded at each service point 104 and that is stored on main storage device 16, for example, in an appropriate database.
- data management device 22 receives the meta data generated by service point 104 during the data ingest process and forwards the same to main computing system 102. In this manner, the location of all the data that is available for performing a particular analysis is stored, for example, on main computing system 102.
- data movement planner 68 may determine that data needs to be moved from service point 104 to central computing system 102, e.g., due to the maximum local data size on storage device 12 being exceeded.
- Data movement planner is configured to choose from moving the data in an online or an offline manner, i.e., by transferring the same via data link 108 or via physical mail, for example, using portable hard drives 32 sent, for example, via DHL or a similar courier service.
- the corresponding determinations are forwarded to an online move queue 70 and an offline move queue 72.
- the locations of the moved data are continuously updated and stored by data management device 22 to be used in subsequent queries.
- specific queries may be automatically generated by distributed data analysis system 100, for example, standard queries for certain car behaviors, car locations, data types and the like, which may be generated on a regular basis, and the results of the queries may be stored for future reference without immediately being reported to a user.
- predefined report may be generated after lapse of a predetermined time period, for example, on a weekly basis, and the available data may be retrieved by engineers in a web client or as a PDF document at a later time.
- Data analysis system 200 is suitable for use in an autonomous driving application.
- a large number of algorithms has to be developed to interpret incoming sensory data from, e.g., cameras, radar or lidar systems or the like in order to maintain an accurate representation of the vehicle state and its environment.
- the sensory data which normally has to be analyzed in real time, is recorded, such that new versions of an algorithm can be tested on the same data set.
- the rate of data is extremely high, for example, around 2 GB per second. Clearly, this requires a large available storage space. Therefore, typically, the test drives are performed in the vicinity of main computing system 202 at the headquarters.
- Second server node 117 may have an intermediate amount of computing power for data that might be accessed not in the immediate future, but perhaps in the foreseeable future, or perhaps less frequently than the "hot” data (referred to herein as “warm” data). It will be appreciated that additional server nodes for data that is even less likely to be accessed, having even less computing power and considerably higher storage capacity, may also be provided (for "cold” data). In addition, an object store 140 that has practically no computing power is provided for data that is outdated, but has to be kept for various reasons (“frozen” data). It should be noted that, in some embodiments, at least some of the nodes having data with different temperatures may also be provided at geographically different locations, instead of being co-located with each other, for example, at the headquarters.
- Data distribution device 120 is configured to classify the measurement data stored on the respective storage devices into data having different priorities, for example, based on one or more predetermined criteria, and to transfer data having a low priority to a server node that has lower computing power.
- data distribution device 120 may be configured to classify some measurement data stored on first storage device 112 as having a lower priority and transfer the same to second storage device 116.
- data that is stored on, for example, second storage device 116 may be transferred to first storage device 112, if necessary.
- Data classification can be based on, for example, access times, creation dates, or other meta data or content-related criteria.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
La présente invention concerne un système d'analyse de données réparties (100) pour l'analyse de grandes quantités de données de mesure recueillies, par exemple, des données qui sont accumulées lors d'essais de route de véhicules (2) dans le domaine de l'ingénierie automobile. Lors de la réception d'une requête pour une analyse devant être effectuée sur les données de mesure recueillies, un dispositif d'analyse (26) détermine lesquels parmi une pluralité de différents dispositifs de stockage (12, 16) situés à des emplacements géographiquement différents comprennent des données de mesure pertinentes par rapport à la requête, et effectuent l'analyse sur les données de mesure stockées sur les dispositifs de stockage appropriés. Les résultats partiels de l'analyse sont combinés et renvoyés à un utilisateur. Des données sont transférées entre les dispositifs de stockage (12, 16) sur la base, par exemple, d'une capacité de stockage restante desdits dispositifs ou d'une priorité des données de mesure.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP16720348.8A EP3420451A1 (fr) | 2016-05-02 | 2016-05-02 | Système et procédé d'analyse de données réparties |
PCT/EP2016/000713 WO2017190757A1 (fr) | 2016-05-02 | 2016-05-02 | Système et procédé d'analyse de données réparties |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2016/000713 WO2017190757A1 (fr) | 2016-05-02 | 2016-05-02 | Système et procédé d'analyse de données réparties |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017190757A1 true WO2017190757A1 (fr) | 2017-11-09 |
Family
ID=55910911
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2016/000713 WO2017190757A1 (fr) | 2016-05-02 | 2016-05-02 | Système et procédé d'analyse de données réparties |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP3420451A1 (fr) |
WO (1) | WO2017190757A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110187829A (zh) * | 2019-04-22 | 2019-08-30 | 上海蔚来汽车有限公司 | 一种数据处理方法、装置、系统及电子设备 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140040575A1 (en) * | 2012-08-01 | 2014-02-06 | Netapp, Inc. | Mobile hadoop clusters |
US20140195558A1 (en) * | 2013-01-07 | 2014-07-10 | Raghotham Murthy | System and method for distributed database query engines |
-
2016
- 2016-05-02 EP EP16720348.8A patent/EP3420451A1/fr not_active Withdrawn
- 2016-05-02 WO PCT/EP2016/000713 patent/WO2017190757A1/fr active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140040575A1 (en) * | 2012-08-01 | 2014-02-06 | Netapp, Inc. | Mobile hadoop clusters |
US20140195558A1 (en) * | 2013-01-07 | 2014-07-10 | Raghotham Murthy | System and method for distributed database query engines |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110187829A (zh) * | 2019-04-22 | 2019-08-30 | 上海蔚来汽车有限公司 | 一种数据处理方法、装置、系统及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
EP3420451A1 (fr) | 2019-01-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220300273A1 (en) | Over-the-air (ota) mobility services platform | |
EP2752779B1 (fr) | Système et procédé pour moteurs d'interrogation de base de données distribués | |
US20190377816A1 (en) | Tool for Creating and Deploying Configurable Enrichment Pipelines | |
US20160012107A1 (en) | Mapping query operations in database systems to hardware based query accelerators | |
US20190377817A1 (en) | Tool for Creating and Deploying Configurable Pipelines | |
US20170085661A1 (en) | Computer Systems and Methods for Sharing Asset-Related Information Between Data Platforms Over a Network | |
US20210373914A1 (en) | Batch to stream processing in a feature management platform | |
US20220188194A1 (en) | Cloud-based database backup and recovery | |
US20200065405A1 (en) | Computer System & Method for Simplifying a Geospatial Dataset Representing an Operating Environment for Assets | |
JP6501675B2 (ja) | 設定可能な搭載型の情報処理 | |
US11797527B2 (en) | Real time fault tolerant stateful featurization | |
CN112019605A (zh) | 数据流的数据分发方法和系统 | |
US11907913B2 (en) | Maintaining an aircraft with automated acquisition of replacement aircraft parts | |
Killeen | Knowledge-based predictive maintenance for fleet management | |
WO2017190757A1 (fr) | Système et procédé d'analyse de données réparties | |
US11775864B2 (en) | Feature management platform | |
RU2718215C2 (ru) | Система обработки данных и способ обнаружения затора в системе обработки данных | |
Hilgendorf | Efficient industrial big data pipeline for lossless transfer of vehicular data | |
Matesanz et al. | Demand-driven data acquisition for large scale fleets | |
US20210374637A1 (en) | Analyzing and managing production and supply chain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 2016720348 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2016720348 Country of ref document: EP Effective date: 20180928 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16720348 Country of ref document: EP Kind code of ref document: A1 |