CN113688877B

CN113688877B - Test data processing method and device, storage medium, instrument and vehicle

Info

Publication number: CN113688877B
Application number: CN202110870972.6A
Authority: CN
Inventors: 朱泽伟; 张光辉; 高力
Original assignee: United Automotive Electronic Systems Co Ltd
Current assignee: United Automotive Electronic Systems Co Ltd
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2024-04-16
Anticipated expiration: 2041-07-30
Also published as: CN113688877A

Abstract

The invention discloses a test data processing method, a device, a storage medium, an instrument and a vehicle, which provide an optimization scheme for data acquisition and preprocessing in a test process; the method is used for converting extracted ASAM original data into RDD/Dataframe data sets which can be directly processed by spark through a special algorithm, and an optimization method capable of improving partition characteristics is also disclosed. The method has the outstanding effect that the original ASAM data remotely calibrated is converted into Spark-RDD/Dataframe to generate data types meeting the standard Spark processing flow; meanwhile, the original partition of spark is reasonably compensated, so that the result is more accurate, and the problem of error in large data processing during capturing of continuous fragments is avoided.

Description

Test data processing method and device, storage medium, instrument and vehicle

Technical Field

The invention belongs to the technical field of virtual instruments, and particularly relates to a test data processing method, a device, a storage medium, an instrument and a vehicle.

Background

Aiming at the processing method of mass data, distributed computation based on big data processing frames such as Spark/store/flink and the like is often adopted; wherein Spark is a generic parallel framework of Hadoop MapReduce-like origin from AMP laboratories of the university of California, berkeley division.

The inventors found that: the mass data obtained by remote calibration acquisition is mf4 format data meeting ASAM (Association for Standardisation of Automation and Measuring Systems, automation and measurement System standards Association) protocol, the form and the shape of the data are greatly different from RDD (ResilientDistributed Datasets, elastic distributed data set) elastic distributed data set, dataframe data table and the like defined by Spark, no existing processing library and function package are available at present so as to meet corresponding functions, and the ASAM data needs to be processed into data types conforming to the Spark framework.

The inventors have also found that: the matching calibration field has specificity on a data research mode, and the research on data often has time continuity requirement, so that the data fragments need to be effectively inspected. Traditional big data tends to be more focused on a certain data sample point, so there are only hashcompationner (key-based partitioning) and range partitioner (range partitioning) for a single sample.

When studying a physical phenomenon, the location marked by the annular region (100) of FIG. 1 is the mining event of interest. In the case of processing analysis using a framework such as Spark, since the data size is large, it is necessary to divide and store the data in partitions (sections) of each node, and when RDD is operated, data in each partition is actually operated in parallel. As shown in fig. 1, when Spark automatically partitions data, if two partitions (partition 0 and partition 1) appear, the target segment is just truncated, which may cause feature loss and data mining failure.

Disclosure of Invention

The invention discloses a test data processing method, a device, a storage medium, an instrument and a vehicle, which provide an optimization scheme for data acquisition and preprocessing in a test process. In particular, the method comprises the steps of,

obtaining basic resources of system analysis by obtaining the original data of a measurable target unit; comprising the following steps: first data and/or second data; wherein the first data is the information of the target unit, and the second data is the parameter information of the target unit; the target unit is a measurable unit of the test object.

Obtaining N single-channel data by extracting original data in the first data and/or the second data, wherein N is a natural number and represents the number of channels; obtaining a seed channel from the 1 st to the N th single channel data according to a preset rule, constructing a new time axis according to the seed channel, and filling the new time axis and data values corresponding to the new time axis into a first vector structure; processing data of M channels except the seed channel according to a preset rule, and connecting the data to a first vector structure to obtain a first target matrix; likewise, M is a natural number.

Further, constructing a first domain formed by T first target matrices; storing the first domain in a framework unit; carrying out preset treatment on the T first target matrixes to obtain T second target matrixes; packaging the T second target matrixes to obtain T third target matrixes; the T third target matrices form a first target data set.

Further, the preset process here includes at least one of the following methods: i.e., transpose process, array, aarrayBuffer, list, listBuffer, tuple, tuple2, multiplex 3, etc.; the encapsulation here includes using the Seq list structure in scale.

Further, the raw data here includes: time, signal value, conversion formula, character string; and meets ASAM data standards.

Further, a first target data set to be optimized is read, and the first target data set is segmented to obtain R partitions, wherein R is a natural number; the primary key is divided into P consecutive equidistant first segments or primary keys.

Fixed step calculation based on ECU control strategy can also be adopted; wherein the step-wise calculation comprises at least one of the following methods: namely directed graph computation with timing, undirected graph computation, and state graph computation.

Further, analyzing and processing by using a spark built-in method; and merging and/or de-duplicating the calculation results.

The invention also relates to a test data processing device comprising: input unit, conversion unit, output unit.

Corresponding to the method: i.e. the input unit acquires the first data and/or the second data of the target unit; the first data is the information of the target unit, and the second data is the parameter information of the target unit; the target unit is a measurable unit of the test object; the conversion unit extracts the original data in the first data and/or the second data to obtain N single-channel data, wherein N is a natural number.

And obtaining a seed channel from the 1 st to the N th single channel data according to a preset rule, constructing a new time axis according to the seed channel, and filling the new time axis and data values corresponding to the new time axis into the first vector structure.

The output unit processes data of M channels except the seed channel according to a preset rule, is connected to a first vector structure and outputs a first target matrix; likewise, M is a natural number.

Further, the device is configured by constructing a first domain formed by T first target matrices; storing the first domain in a framework unit; carrying out preset treatment on the T first target matrixes to obtain T second target matrixes; packaging the T second target matrixes to obtain T third target matrixes; t third target matrices form a first target data set; wherein T is a natural number.

Further, the preset process may include one of the following methods: i.e., transpose process, array, aarrayBuffer, list, listBuffer, tuple, tuple2, multiplex 3, etc.; its encapsulation operation involves the use of the Seq list structure in scale.

Further, the original data includes time, signal value, conversion formula, character string and ASAM standard data.

Further, the first target data set to be optimized is read, and is segmented to obtain R partitions, wherein R is a natural number; dividing the primary key into P first segments with continuous equal distance or dividing the primary key; step-size calculation based on an ECU control strategy; wherein the step-wise calculation comprises at least one of the following methods: namely, the directed graph calculation with time sequence, the undirected graph calculation and the state graph calculation; analyzing and processing by using a spark built-in method; and merging and/or de-duplicating the calculation results.

Further, the invention relates to a computer readable storage medium comprising a storage medium body for storing a computer program; any of the methods described above may be implemented when the computer program is executed by a microprocessor.

Further, the present invention relates to an analytical instrument and a vehicle, which can implement the method or operation implemented by any one of the above devices or storage media.

The partition compensation method comprises the steps that a first target data set to be optimized is read, and the first target data set is segmented to obtain R partitions, wherein R is a natural number; the primary key is divided into P consecutive equidistant first segments or primary keys.

Meanwhile, the fixed step length calculation based on the ECU control strategy can be performed; the method comprises the steps of carrying out directed graph calculation, undirected graph calculation and state graph calculation with time sequence, and further processing data or analyzing and processing by applying a spark built-in method; and merging and/or de-duplicating the calculation results.

It should be noted that, the words "first", "second", and the like used in the present invention are merely for describing each component element in the technical solution, and do not constitute limitation of the technical solution, and are not understood as indication or suggestion of importance of the corresponding element; elements with "first", "second" and the like mean that in the corresponding technical solution, the element includes at least one.

The invention has the advantages that:

1. the remotely scaled raw ASAM data is converted to Spark-RDD/Dataframe, resulting in data types that meet the standard Spark process flow.

2. The original partition of spark is reasonably compensated, so that the result is more accurate, and the problem of error in large data processing when capturing continuous fragments is avoided (as shown in figure 1).

Drawings

In order to more clearly illustrate the technical solution of the present invention, the technical effects, technical features and objects of the present invention will be further understood, and the present invention will be described in detail below with reference to the accompanying drawings, which form a necessary part of the specification, and together with the embodiments of the present invention serve to illustrate the technical solution of the present invention, but not to limit the present invention.

Like reference numerals in the drawings denote like parts, in particular:

FIG. 1 is an example of a loss or loss of mining information in the prior art;

FIG. 2 is a schematic diagram of a data conversion process according to the present invention;

FIG. 3 is a flow chart of data conversion according to an embodiment of the present invention;

FIG. 4 is a flow chart of a partition optimization method according to an embodiment of the present invention;

FIG. 5 is a flow chart of data optimization according to an embodiment of the present invention;

wherein:

1-acquiring data and parameters, 2-extracting data, 3-reconstructing vectors, 4-constructing a matrix, 5-creating a structural domain, 6-optimizing, 7-subsequent processing and 8-generating a data set;

11-cut-slice, 22-slice compensation, 33-generate challenge samples, 44-assist transport, 55-analyze calculations, 66-merge deduplication.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. Of course, the following specific embodiments are only for explaining the technical solution of the present invention, and are not limiting.

Furthermore, the portions expressed in the examples or the drawings are merely illustrative of the relevant portions of the present invention, and not all of the present invention.

Fig. 2-5 are schematic diagrams of a data conversion process according to the present invention, in which first data and/or second data of a target unit are obtained.

Wherein the first data is the information of the target unit, and the second data is the parameter information of the target unit; the target unit is a measurable unit of the test object; n single-channel data are obtained by extracting the original data in the first data and/or the second data, wherein N is a natural number.

Obtaining a seed channel from the 1 st to the N th single channel data according to a preset rule, constructing a new time axis according to the seed channel, and filling the new time axis and data values corresponding to the new time axis into a first vector structure; processing data of M channels except the seed channel according to a preset rule, and connecting the data to a first vector structure to obtain a first target matrix; where M is a natural number.

Further, constructing a first domain formed by T first target matrices; storing the first domain in a framework unit; carrying out preset treatment on the T first target matrixes to obtain T second target matrixes; packaging the T second target matrixes to obtain T third target matrixes; t third target matrices form a first target data set; wherein T is a natural number.

Further, the preset process here includes at least one of the following methods: i.e., transpose process, array, aarrayBuffer, list, listBuffer, tuple, tuple2, multiplex 3, etc.; the encapsulation here includes employing the Seq list structure in scale.

Further, the original data includes time, signal value, conversion formula and character string; and the original data adopts ASAM standard data.

Further, fixed step calculation based on the ECU control strategy; at least one of the following methods is adopted: namely, the directed graph calculation with time sequence, the undirected graph calculation and the state graph calculation; or analyzing and processing by using a spark built-in method; and merging and/or de-duplicating the calculation results.

Corresponding to the method, the invention also relates to a test data processing device, which comprises: an input unit, a conversion unit, and an output unit.

The input unit acquires first data and/or second data of the target unit; the first data is information of the target unit, and the second data is parameter information of the target unit.

The target unit is a measurable unit of the test object; the conversion unit extracts original data in the first data and/or the second data to obtain N single-channel data, wherein N is a natural number; and obtaining a seed channel from the 1 st to the N th single channel data according to a preset rule, constructing a new time axis according to the seed channel, and filling the new time axis and data values corresponding to the new time axis into the first vector structure.

The output unit processes data of M channels except the seed channel according to a preset rule, is connected to a first vector structure and outputs a first target matrix; where M is a natural number.

Further, constructing a first domain formed by T first target matrices; storing the first domain in a framework unit; carrying out preset treatment on the T first target matrixes to obtain T second target matrixes; packaging the T second target matrixes to obtain T third target matrixes; the T third target matrices form the first target data set.

Further, the preset process includes one of the following methods: namely a transpose process, array, aarrayBuffer, list, listBuffer, tuple, tuple2, complete 3; and encapsulation involves using the Seq list structure in scale.

Further, the original data comprises time, signal value, conversion formula and character string; the original data adopts ASAM standard data.

Further, as shown in fig. 5, by reading a first target data set to be optimized, dividing the first target data set to obtain R partitions, where R is a natural number; dividing the primary key into P first segments with continuous equal distance or dividing the primary key; step-size calculation based on an ECU control strategy; wherein the step-wise calculation comprises at least one of the following methods: directed graph calculation, undirected graph calculation and state graph calculation with time sequence; analyzing and processing by using a spark built-in method; and merging and/or de-duplicating the calculation results.

It should be noted that the foregoing examples are merely for clearly illustrating the technical solution of the present invention, and those skilled in the art will understand that the embodiments of the present invention are not limited to the foregoing, and that obvious changes, substitutions or alterations can be made based on the foregoing without departing from the scope covered by the technical solution of the present invention; other embodiments will fall within the scope of the invention without departing from the inventive concept.

Claims

1. A method of testing data processing comprising:

acquiring first data and/or second data of a target unit;

wherein the first data is information of the target unit, and the second data is parameter information of the target unit; the target unit is a measurable unit of the test object;

extracting original data in the first data and/or the second data to obtain N single-channel data, wherein N is a natural number;

obtaining a seed channel from the 1 st to the N th single channel data according to a preset rule, constructing a new time axis according to the seed channel, and filling the new time axis and data values corresponding to the new time axis into a first vector structure;

processing data of M channels except the seed channel according to a preset rule, and connecting the data to the first vector structure to obtain a first target matrix; where M is a natural number.

2. The processing method of claim 1, further comprising:

constructing a first domain formed by T of said first target matrices; storing the first domain in a framework unit;

performing preset processing on the T first target matrixes to obtain T second target matrixes; wherein T is a natural number;

packaging the T second target matrixes to obtain T third target matrixes; t of said third target matrices form a first target data set.

3. The process according to claim 2, wherein,

the preset processing comprises one of the following methods: transpose processing, array, aarrayBuffer, list, listBuffer, tuple, tuple, complete 3;

the encapsulation includes employing a Seq list structure in scale.

4. A process according to any one of claim 1 to 3, wherein,

the raw data includes: time, signal value, conversion formula, character string;

the original data adopts ASAM standard data.

5. The process of claim 4, comprising:

reading a first target data set to be optimized, and dividing the first target data set to obtain R partitions, wherein R is a natural number;

the primary key is divided into P consecutive equidistant first segments or primary keys.

6. The processing method of claim 5, comprising:

step-size calculation based on an ECU control strategy; wherein the step-wise calculation comprises at least one of the following methods: directed graph computation, undirected graph computation, and state graph computation with timing.

7. The processing method according to claim 5 or 6, comprising:

analyzing and processing by using a spark built-in method; and merging and/or de-duplicating the calculation results.

8. A test data processing apparatus comprising:

an input unit, a conversion unit and an output unit; wherein,

the input unit acquires first data and/or second data of the target unit; the first data is the information of the target unit, and the second data is the parameter information of the target unit; the target unit is a measurable unit of the test object;

the conversion unit extracts the original data in the first data and/or the second data to obtain N single-channel data, wherein N is a natural number; obtaining a seed channel from the 1 st to the N th single channel data according to a preset rule, constructing a new time axis according to the seed channel, and filling the new time axis and data values corresponding to the new time axis into a first vector structure;

the output unit processes data of M channels except the seed channel according to a preset rule, is connected to the first vector structure and outputs a first target matrix; where M is a natural number.

9. The apparatus of claim 8, comprising:

10. The apparatus of claim 9, wherein:

the encapsulation includes employing a Seq list structure in scale.

11. The apparatus of any of claims 8-10, wherein:

the raw data includes: time, signal value, conversion formula, character string; the original data adopts ASAM standard data.

12. The apparatus of claim 11, comprising:

reading a first target data set to be optimized, and dividing the first target data set to obtain R partitions, wherein R is a natural number; dividing the primary key into P first segments with continuous equal distance or dividing the primary key;

step-size calculation based on an ECU control strategy; wherein the step-wise calculation comprises at least one of the following methods: directed graph calculation, undirected graph calculation and state graph calculation with time sequence;

13. A computer-readable storage medium, comprising:

a storage medium body for storing a computer program;

the computer program, when executed by a microprocessor, implements the method according to any of claims 1-7.

14. An analytical instrument, comprising:

the apparatus of any of claims 8-12 and/or the storage medium of claim 13.

15. A vehicle, comprising:

the apparatus of any of claims 8-12 and/or the storage medium of claim 13.