CN113688877B - Test data processing method and device, storage medium, instrument and vehicle - Google Patents

Test data processing method and device, storage medium, instrument and vehicle Download PDF

Info

Publication number
CN113688877B
CN113688877B CN202110870972.6A CN202110870972A CN113688877B CN 113688877 B CN113688877 B CN 113688877B CN 202110870972 A CN202110870972 A CN 202110870972A CN 113688877 B CN113688877 B CN 113688877B
Authority
CN
China
Prior art keywords
data
target
unit
processing
natural number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110870972.6A
Other languages
Chinese (zh)
Other versions
CN113688877A (en
Inventor
朱泽伟
张光辉
高力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
United Automotive Electronic Systems Co Ltd
Original Assignee
United Automotive Electronic Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by United Automotive Electronic Systems Co Ltd filed Critical United Automotive Electronic Systems Co Ltd
Priority to CN202110870972.6A priority Critical patent/CN113688877B/en
Publication of CN113688877A publication Critical patent/CN113688877A/en
Application granted granted Critical
Publication of CN113688877B publication Critical patent/CN113688877B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/2163Partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a test data processing method, a device, a storage medium, an instrument and a vehicle, which provide an optimization scheme for data acquisition and preprocessing in a test process; the method is used for converting extracted ASAM original data into RDD/Dataframe data sets which can be directly processed by spark through a special algorithm, and an optimization method capable of improving partition characteristics is also disclosed. The method has the outstanding effect that the original ASAM data remotely calibrated is converted into Spark-RDD/Dataframe to generate data types meeting the standard Spark processing flow; meanwhile, the original partition of spark is reasonably compensated, so that the result is more accurate, and the problem of error in large data processing during capturing of continuous fragments is avoided.

Description

Test data processing method and device, storage medium, instrument and vehicle
Technical Field
The invention belongs to the technical field of virtual instruments, and particularly relates to a test data processing method, a device, a storage medium, an instrument and a vehicle.
Background
Aiming at the processing method of mass data, distributed computation based on big data processing frames such as Spark/store/flink and the like is often adopted; wherein Spark is a generic parallel framework of Hadoop MapReduce-like origin from AMP laboratories of the university of California, berkeley division.
The inventors found that: the mass data obtained by remote calibration acquisition is mf4 format data meeting ASAM (Association for Standardisation of Automation and Measuring Systems, automation and measurement System standards Association) protocol, the form and the shape of the data are greatly different from RDD (ResilientDistributed Datasets, elastic distributed data set) elastic distributed data set, dataframe data table and the like defined by Spark, no existing processing library and function package are available at present so as to meet corresponding functions, and the ASAM data needs to be processed into data types conforming to the Spark framework.
The inventors have also found that: the matching calibration field has specificity on a data research mode, and the research on data often has time continuity requirement, so that the data fragments need to be effectively inspected. Traditional big data tends to be more focused on a certain data sample point, so there are only hashcompationner (key-based partitioning) and range partitioner (range partitioning) for a single sample.
When studying a physical phenomenon, the location marked by the annular region (100) of FIG. 1 is the mining event of interest. In the case of processing analysis using a framework such as Spark, since the data size is large, it is necessary to divide and store the data in partitions (sections) of each node, and when RDD is operated, data in each partition is actually operated in parallel. As shown in fig. 1, when Spark automatically partitions data, if two partitions (partition 0 and partition 1) appear, the target segment is just truncated, which may cause feature loss and data mining failure.
Disclosure of Invention
The invention discloses a test data processing method, a device, a storage medium, an instrument and a vehicle, which provide an optimization scheme for data acquisition and preprocessing in a test process. In particular, the method comprises the steps of,
obtaining basic resources of system analysis by obtaining the original data of a measurable target unit; comprising the following steps: first data and/or second data; wherein the first data is the information of the target unit, and the second data is the parameter information of the target unit; the target unit is a measurable unit of the test object.
Obtaining N single-channel data by extracting original data in the first data and/or the second data, wherein N is a natural number and represents the number of channels; obtaining a seed channel from the 1 st to the N th single channel data according to a preset rule, constructing a new time axis according to the seed channel, and filling the new time axis and data values corresponding to the new time axis into a first vector structure; processing data of M channels except the seed channel according to a preset rule, and connecting the data to a first vector structure to obtain a first target matrix; likewise, M is a natural number.
Further, constructing a first domain formed by T first target matrices; storing the first domain in a framework unit; carrying out preset treatment on the T first target matrixes to obtain T second target matrixes; packaging the T second target matrixes to obtain T third target matrixes; the T third target matrices form a first target data set.
Further, the preset process here includes at least one of the following methods: i.e., transpose process, array, aarrayBuffer, list, listBuffer, tuple, tuple2, multiplex 3, etc.; the encapsulation here includes using the Seq list structure in scale.
Further, the raw data here includes: time, signal value, conversion formula, character string; and meets ASAM data standards.
Further, a first target data set to be optimized is read, and the first target data set is segmented to obtain R partitions, wherein R is a natural number; the primary key is divided into P consecutive equidistant first segments or primary keys.
Fixed step calculation based on ECU control strategy can also be adopted; wherein the step-wise calculation comprises at least one of the following methods: namely directed graph computation with timing, undirected graph computation, and state graph computation.
Further, analyzing and processing by using a spark built-in method; and merging and/or de-duplicating the calculation results.
The invention also relates to a test data processing device comprising: input unit, conversion unit, output unit.
Corresponding to the method: i.e. the input unit acquires the first data and/or the second data of the target unit; the first data is the information of the target unit, and the second data is the parameter information of the target unit; the target unit is a measurable unit of the test object; the conversion unit extracts the original data in the first data and/or the second data to obtain N single-channel data, wherein N is a natural number.
And obtaining a seed channel from the 1 st to the N th single channel data according to a preset rule, constructing a new time axis according to the seed channel, and filling the new time axis and data values corresponding to the new time axis into the first vector structure.
The output unit processes data of M channels except the seed channel according to a preset rule, is connected to a first vector structure and outputs a first target matrix; likewise, M is a natural number.
Further, the device is configured by constructing a first domain formed by T first target matrices; storing the first domain in a framework unit; carrying out preset treatment on the T first target matrixes to obtain T second target matrixes; packaging the T second target matrixes to obtain T third target matrixes; t third target matrices form a first target data set; wherein T is a natural number.
Further, the preset process may include one of the following methods: i.e., transpose process, array, aarrayBuffer, list, listBuffer, tuple, tuple2, multiplex 3, etc.; its encapsulation operation involves the use of the Seq list structure in scale.
Further, the original data includes time, signal value, conversion formula, character string and ASAM standard data.
Further, the first target data set to be optimized is read, and is segmented to obtain R partitions, wherein R is a natural number; dividing the primary key into P first segments with continuous equal distance or dividing the primary key; step-size calculation based on an ECU control strategy; wherein the step-wise calculation comprises at least one of the following methods: namely, the directed graph calculation with time sequence, the undirected graph calculation and the state graph calculation; analyzing and processing by using a spark built-in method; and merging and/or de-duplicating the calculation results.
Further, the invention relates to a computer readable storage medium comprising a storage medium body for storing a computer program; any of the methods described above may be implemented when the computer program is executed by a microprocessor.
Further, the present invention relates to an analytical instrument and a vehicle, which can implement the method or operation implemented by any one of the above devices or storage media.
The partition compensation method comprises the steps that a first target data set to be optimized is read, and the first target data set is segmented to obtain R partitions, wherein R is a natural number; the primary key is divided into P consecutive equidistant first segments or primary keys.
Meanwhile, the fixed step length calculation based on the ECU control strategy can be performed; the method comprises the steps of carrying out directed graph calculation, undirected graph calculation and state graph calculation with time sequence, and further processing data or analyzing and processing by applying a spark built-in method; and merging and/or de-duplicating the calculation results.
It should be noted that, the words "first", "second", and the like used in the present invention are merely for describing each component element in the technical solution, and do not constitute limitation of the technical solution, and are not understood as indication or suggestion of importance of the corresponding element; elements with "first", "second" and the like mean that in the corresponding technical solution, the element includes at least one.
The invention has the advantages that:
1. the remotely scaled raw ASAM data is converted to Spark-RDD/Dataframe, resulting in data types that meet the standard Spark process flow.
2. The original partition of spark is reasonably compensated, so that the result is more accurate, and the problem of error in large data processing when capturing continuous fragments is avoided (as shown in figure 1).
Drawings
In order to more clearly illustrate the technical solution of the present invention, the technical effects, technical features and objects of the present invention will be further understood, and the present invention will be described in detail below with reference to the accompanying drawings, which form a necessary part of the specification, and together with the embodiments of the present invention serve to illustrate the technical solution of the present invention, but not to limit the present invention.
Like reference numerals in the drawings denote like parts, in particular:
FIG. 1 is an example of a loss or loss of mining information in the prior art;
FIG. 2 is a schematic diagram of a data conversion process according to the present invention;
FIG. 3 is a flow chart of data conversion according to an embodiment of the present invention;
FIG. 4 is a flow chart of a partition optimization method according to an embodiment of the present invention;
FIG. 5 is a flow chart of data optimization according to an embodiment of the present invention;
wherein:
1-acquiring data and parameters, 2-extracting data, 3-reconstructing vectors, 4-constructing a matrix, 5-creating a structural domain, 6-optimizing, 7-subsequent processing and 8-generating a data set;
11-cut-slice, 22-slice compensation, 33-generate challenge samples, 44-assist transport, 55-analyze calculations, 66-merge deduplication.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. Of course, the following specific embodiments are only for explaining the technical solution of the present invention, and are not limiting.
Furthermore, the portions expressed in the examples or the drawings are merely illustrative of the relevant portions of the present invention, and not all of the present invention.
Fig. 2-5 are schematic diagrams of a data conversion process according to the present invention, in which first data and/or second data of a target unit are obtained.
Wherein the first data is the information of the target unit, and the second data is the parameter information of the target unit; the target unit is a measurable unit of the test object; n single-channel data are obtained by extracting the original data in the first data and/or the second data, wherein N is a natural number.
Obtaining a seed channel from the 1 st to the N th single channel data according to a preset rule, constructing a new time axis according to the seed channel, and filling the new time axis and data values corresponding to the new time axis into a first vector structure; processing data of M channels except the seed channel according to a preset rule, and connecting the data to a first vector structure to obtain a first target matrix; where M is a natural number.
Further, constructing a first domain formed by T first target matrices; storing the first domain in a framework unit; carrying out preset treatment on the T first target matrixes to obtain T second target matrixes; packaging the T second target matrixes to obtain T third target matrixes; t third target matrices form a first target data set; wherein T is a natural number.
Further, the preset process here includes at least one of the following methods: i.e., transpose process, array, aarrayBuffer, list, listBuffer, tuple, tuple2, multiplex 3, etc.; the encapsulation here includes employing the Seq list structure in scale.
Further, the original data includes time, signal value, conversion formula and character string; and the original data adopts ASAM standard data.
Further, a first target data set to be optimized is read, and the first target data set is segmented to obtain R partitions, wherein R is a natural number; the primary key is divided into P consecutive equidistant first segments or primary keys.
Further, fixed step calculation based on the ECU control strategy; at least one of the following methods is adopted: namely, the directed graph calculation with time sequence, the undirected graph calculation and the state graph calculation; or analyzing and processing by using a spark built-in method; and merging and/or de-duplicating the calculation results.
Corresponding to the method, the invention also relates to a test data processing device, which comprises: an input unit, a conversion unit, and an output unit.
The input unit acquires first data and/or second data of the target unit; the first data is information of the target unit, and the second data is parameter information of the target unit.
The target unit is a measurable unit of the test object; the conversion unit extracts original data in the first data and/or the second data to obtain N single-channel data, wherein N is a natural number; and obtaining a seed channel from the 1 st to the N th single channel data according to a preset rule, constructing a new time axis according to the seed channel, and filling the new time axis and data values corresponding to the new time axis into the first vector structure.
The output unit processes data of M channels except the seed channel according to a preset rule, is connected to a first vector structure and outputs a first target matrix; where M is a natural number.
Further, constructing a first domain formed by T first target matrices; storing the first domain in a framework unit; carrying out preset treatment on the T first target matrixes to obtain T second target matrixes; packaging the T second target matrixes to obtain T third target matrixes; the T third target matrices form the first target data set.
Further, the preset process includes one of the following methods: namely a transpose process, array, aarrayBuffer, list, listBuffer, tuple, tuple2, complete 3; and encapsulation involves using the Seq list structure in scale.
Further, the original data comprises time, signal value, conversion formula and character string; the original data adopts ASAM standard data.
Further, as shown in fig. 5, by reading a first target data set to be optimized, dividing the first target data set to obtain R partitions, where R is a natural number; dividing the primary key into P first segments with continuous equal distance or dividing the primary key; step-size calculation based on an ECU control strategy; wherein the step-wise calculation comprises at least one of the following methods: directed graph calculation, undirected graph calculation and state graph calculation with time sequence; analyzing and processing by using a spark built-in method; and merging and/or de-duplicating the calculation results.
It should be noted that the foregoing examples are merely for clearly illustrating the technical solution of the present invention, and those skilled in the art will understand that the embodiments of the present invention are not limited to the foregoing, and that obvious changes, substitutions or alterations can be made based on the foregoing without departing from the scope covered by the technical solution of the present invention; other embodiments will fall within the scope of the invention without departing from the inventive concept.

Claims (15)

1. A method of testing data processing comprising:
acquiring first data and/or second data of a target unit;
wherein the first data is information of the target unit, and the second data is parameter information of the target unit; the target unit is a measurable unit of the test object;
extracting original data in the first data and/or the second data to obtain N single-channel data, wherein N is a natural number;
obtaining a seed channel from the 1 st to the N th single channel data according to a preset rule, constructing a new time axis according to the seed channel, and filling the new time axis and data values corresponding to the new time axis into a first vector structure;
processing data of M channels except the seed channel according to a preset rule, and connecting the data to the first vector structure to obtain a first target matrix; where M is a natural number.
2. The processing method of claim 1, further comprising:
constructing a first domain formed by T of said first target matrices; storing the first domain in a framework unit;
performing preset processing on the T first target matrixes to obtain T second target matrixes; wherein T is a natural number;
packaging the T second target matrixes to obtain T third target matrixes; t of said third target matrices form a first target data set.
3. The process according to claim 2, wherein,
the preset processing comprises one of the following methods: transpose processing, array, aarrayBuffer, list, listBuffer, tuple, tuple, complete 3;
the encapsulation includes employing a Seq list structure in scale.
4. A process according to any one of claim 1 to 3, wherein,
the raw data includes: time, signal value, conversion formula, character string;
the original data adopts ASAM standard data.
5. The process of claim 4, comprising:
reading a first target data set to be optimized, and dividing the first target data set to obtain R partitions, wherein R is a natural number;
the primary key is divided into P consecutive equidistant first segments or primary keys.
6. The processing method of claim 5, comprising:
step-size calculation based on an ECU control strategy; wherein the step-wise calculation comprises at least one of the following methods: directed graph computation, undirected graph computation, and state graph computation with timing.
7. The processing method according to claim 5 or 6, comprising:
analyzing and processing by using a spark built-in method; and merging and/or de-duplicating the calculation results.
8. A test data processing apparatus comprising:
an input unit, a conversion unit and an output unit; wherein,
the input unit acquires first data and/or second data of the target unit; the first data is the information of the target unit, and the second data is the parameter information of the target unit; the target unit is a measurable unit of the test object;
the conversion unit extracts the original data in the first data and/or the second data to obtain N single-channel data, wherein N is a natural number; obtaining a seed channel from the 1 st to the N th single channel data according to a preset rule, constructing a new time axis according to the seed channel, and filling the new time axis and data values corresponding to the new time axis into a first vector structure;
the output unit processes data of M channels except the seed channel according to a preset rule, is connected to the first vector structure and outputs a first target matrix; where M is a natural number.
9. The apparatus of claim 8, comprising:
constructing a first domain formed by T of said first target matrices; storing the first domain in a framework unit;
performing preset processing on the T first target matrixes to obtain T second target matrixes; wherein T is a natural number;
packaging the T second target matrixes to obtain T third target matrixes; t of said third target matrices form a first target data set.
10. The apparatus of claim 9, wherein:
the preset processing comprises one of the following methods: transpose processing, array, aarrayBuffer, list, listBuffer, tuple, tuple, complete 3;
the encapsulation includes employing a Seq list structure in scale.
11. The apparatus of any of claims 8-10, wherein:
the raw data includes: time, signal value, conversion formula, character string; the original data adopts ASAM standard data.
12. The apparatus of claim 11, comprising:
reading a first target data set to be optimized, and dividing the first target data set to obtain R partitions, wherein R is a natural number; dividing the primary key into P first segments with continuous equal distance or dividing the primary key;
step-size calculation based on an ECU control strategy; wherein the step-wise calculation comprises at least one of the following methods: directed graph calculation, undirected graph calculation and state graph calculation with time sequence;
analyzing and processing by using a spark built-in method; and merging and/or de-duplicating the calculation results.
13. A computer-readable storage medium, comprising:
a storage medium body for storing a computer program;
the computer program, when executed by a microprocessor, implements the method according to any of claims 1-7.
14. An analytical instrument, comprising:
the apparatus of any of claims 8-12 and/or the storage medium of claim 13.
15. A vehicle, comprising:
the apparatus of any of claims 8-12 and/or the storage medium of claim 13.
CN202110870972.6A 2021-07-30 2021-07-30 Test data processing method and device, storage medium, instrument and vehicle Active CN113688877B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110870972.6A CN113688877B (en) 2021-07-30 2021-07-30 Test data processing method and device, storage medium, instrument and vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110870972.6A CN113688877B (en) 2021-07-30 2021-07-30 Test data processing method and device, storage medium, instrument and vehicle

Publications (2)

Publication Number Publication Date
CN113688877A CN113688877A (en) 2021-11-23
CN113688877B true CN113688877B (en) 2024-04-16

Family

ID=78578313

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110870972.6A Active CN113688877B (en) 2021-07-30 2021-07-30 Test data processing method and device, storage medium, instrument and vehicle

Country Status (1)

Country Link
CN (1) CN113688877B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020118928A1 (en) * 2018-12-11 2020-06-18 东北大学 Distributed time sequence pattern retrieval method for massive equipment operation data
AU2020102350A4 (en) * 2020-09-21 2020-10-29 Guizhou Minzu University A Spark-Based Deep Learning Method for Data-Driven Traffic Flow Forecasting
CN112631903A (en) * 2020-12-18 2021-04-09 平安普惠企业管理有限公司 Task testing method and device, electronic equipment and storage medium
CN112949258A (en) * 2021-02-25 2021-06-11 深圳市元征科技股份有限公司 Data processing method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020118928A1 (en) * 2018-12-11 2020-06-18 东北大学 Distributed time sequence pattern retrieval method for massive equipment operation data
AU2020102350A4 (en) * 2020-09-21 2020-10-29 Guizhou Minzu University A Spark-Based Deep Learning Method for Data-Driven Traffic Flow Forecasting
CN112631903A (en) * 2020-12-18 2021-04-09 平安普惠企业管理有限公司 Task testing method and device, electronic equipment and storage medium
CN112949258A (en) * 2021-02-25 2021-06-11 深圳市元征科技股份有限公司 Data processing method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于Spark的国产化海量数据预处理和计算技术;丁派克;曹芳芳;王晓玲;;航天控制(第06期);全文 *
基于ASAM标准的汽车电控系统匹配标定系统设计;温泉;张广秀;张建;;汽车实用技术(第05期);全文 *

Also Published As

Publication number Publication date
CN113688877A (en) 2021-11-23

Similar Documents

Publication Publication Date Title
US10176208B2 (en) Processing time series data from multiple sensors
CN109325009B (en) Log analysis method and device
US20200301956A1 (en) System for organizing and fast searching of massive amounts of data
CN110765154A (en) Method and device for processing mass real-time generated data of thermal power plant
CN114399066A (en) Mechanical equipment predictability maintenance system and maintenance method based on weak supervision learning
CN114325405A (en) Battery pack consistency analysis method, modeling method, device, equipment and medium
CN116821646A (en) Data processing chain construction method, data reduction method, device, equipment and medium
CN118012850B (en) Intelligent irrigation multisource information-oriented database construction system, method and equipment
CN110232130B (en) Metadata management pedigree generation method, apparatus, computer device and storage medium
CN113688877B (en) Test data processing method and device, storage medium, instrument and vehicle
CN116664989B (en) Data analysis method and system based on intelligent environmental element recognition monitoring system
CN102684831A (en) Digital multichannel correlated processing system and output method for buffer module in same
US20140205471A1 (en) Determining top-dead-center (tdc) of reciprocating compressor
Miao et al. Two-level fault diagnosis of SF6 electrical equipment based on big data analysis
CN115656747A (en) Transformer defect diagnosis method and device based on heterogeneous data and computer equipment
CN107544090B (en) Seismic data analyzing and storing method based on MapReduce
JP6741203B2 (en) Analysis equipment
CN110797082A (en) Method and system for storing and reading gene sequencing data
US11748444B2 (en) Device and method for processing data samples
CN115840907B (en) Scene behavior analysis method, device, electronic equipment and medium
CN103810095A (en) Data comparison test method and device
US20170132278A1 (en) Systems and Methods for Inferring Landmark Delimiters for Log Analysis
US7529988B1 (en) Storage of descriptive information in user defined fields of failure bitmaps in integrated circuit technology development
CN107025268A (en) Introduction method, import system and the importing equipment of battery parameter
CN110457359B (en) Correlation analysis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant