CN110990388A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN110990388A
CN110990388A CN201911204530.7A CN201911204530A CN110990388A CN 110990388 A CN110990388 A CN 110990388A CN 201911204530 A CN201911204530 A CN 201911204530A CN 110990388 A CN110990388 A CN 110990388A
Authority
CN
China
Prior art keywords
data
field name
abnormal
processed
name mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911204530.7A
Other languages
Chinese (zh)
Inventor
安春霖
褚波
杜强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Reach Automotive Technology Shenyang Co Ltd
Original Assignee
Neusoft Reach Automotive Technology Shenyang Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Reach Automotive Technology Shenyang Co Ltd filed Critical Neusoft Reach Automotive Technology Shenyang Co Ltd
Priority to CN201911204530.7A priority Critical patent/CN110990388A/en
Publication of CN110990388A publication Critical patent/CN110990388A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Abstract

After data to be processed is obtained, field name mapping is carried out on the data to be processed to obtain first data, when abnormal data exist in the first data, the first data are cleaned to obtain second data, and therefore when data which express abnormity exist in the second data, the second data are subjected to abnormity expression conversion to obtain third data. The field name mapping is used for realizing the field name normalization, the abnormal data cleaning is used for realizing the abnormal data normalization, and the abnormal expression conversion is used for realizing the data expression normalization, so that after the field name mapping, the abnormal data cleaning and the abnormal expression conversion are sequentially carried out on the data to be processed, the normalization of the data to be processed can be realized, the difference among data provided by different data sources is eliminated, and the occurrence of non-ideal data processing results caused by the difference among the data provided by the different data sources can be avoided.

Description

Data processing method and device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method and apparatus.
Background
With the development of internet technology, the amount of data in various fields (e.g., fields of vehicle intelligent control, etc.) is proliferating. In this case, in order to promote the technical development of each field, the large data may be analyzed by a large data processing method in each field, so as to optimize the technical means corresponding to each field based on the analysis result.
However, because the difference between the data provided by different data sources is relatively large, if the data provided by the different data sources is directly used for big data processing, the result of data processing is not ideal.
Disclosure of Invention
In order to solve the technical problems in the prior art, the application provides a data processing method and device, which can normalize data provided by different data sources and avoid the occurrence of an unsatisfactory data processing result caused by differences between data provided by different data sources.
In order to achieve the above purpose, the technical solutions provided in the embodiments of the present application are as follows:
an embodiment of the present application provides a data processing method, including:
acquiring data to be processed, and performing field name mapping on the data to be processed to obtain first data;
when abnormal data exist in the first data, performing abnormal data cleaning on the first data to obtain second data;
and when determining that the second data has data expressing abnormity, performing abnormity expression conversion on the second data to obtain third data.
Optionally, the performing field name mapping on the data to be processed to obtain first data includes:
and performing field name mapping on the data to be processed by utilizing a pre-constructed target field name mapping relation to obtain first data.
Optionally, when the data to be processed is provided by a target data source, the construction process of the target field name mapping relationship includes:
generating a target field name mapping relation according to the field name used by the target data source and the corresponding normalized field name;
and/or the presence of a gas in the gas,
and performing field name mapping analysis on the historical data provided by the target data source to obtain a target field name mapping relation.
Optionally, the performing abnormal data cleaning on the first data to obtain second data includes:
deleting abnormal data in the first data to obtain second data;
and/or the presence of a gas in the gas,
and replacing abnormal data in the first data by using the filling data to obtain second data.
Optionally, the abnormal data includes invalid data and/or data exceeding a normal data range.
Optionally, the exception expression conversion includes: at least one of a numerical offset, a numerical scaling, or a numerical mapping.
Optionally, the method further includes:
performing data processing by using the third data to obtain a data processing result; wherein the data processing includes at least one of anomaly detection, battery pack parameter evaluation correction, remaining duration prediction, remaining range prediction, battery life prediction, vehicle life prediction, driving behavior analysis, or authority authentication.
An embodiment of the present application further provides a data processing apparatus, including:
the data acquisition unit is used for acquiring data to be processed;
the field mapping unit is used for carrying out field name mapping on the data to be processed to obtain first data;
the abnormal cleaning unit is used for cleaning the first data to obtain second data when the abnormal data exists in the first data;
and the expression conversion unit is used for performing abnormal expression conversion on the second data to obtain third data when the second data is determined to have abnormal expression data.
An embodiment of the present application further provides an apparatus, where the apparatus includes a processor and a memory:
the memory is used for storing a computer program;
the processor is used for executing the data processing method provided by the embodiment of the application according to the computer program.
The embodiment of the application also provides a computer-readable storage medium, which is used for storing a computer program, and the computer program is used for executing the data processing method provided by the embodiment of the application.
Compared with the prior art, the embodiment of the application has at least the following advantages:
in the data processing method provided by the embodiment of the application, after the data to be processed is obtained, field name mapping is performed on the data to be processed to obtain first data, and when abnormal data exists in the first data, the first data is subjected to abnormal data cleaning to obtain second data, so that when data expressing abnormality exists in the second data, the second data is subjected to abnormal expression conversion to obtain third data. The field name mapping is used for realizing the field name normalization, the abnormal data cleaning is used for realizing the abnormal data normalization, and the abnormal expression conversion is used for realizing the data expression normalization, so that after the field name mapping, the abnormal data cleaning and the abnormal expression conversion are sequentially carried out on the data to be processed, the normalization of the data to be processed can be realized, the difference among data provided by different data sources is eliminated, and the occurrence of non-ideal data processing results caused by the difference among the data provided by the different data sources can be avoided.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present application;
fig. 2 is an overall schematic diagram of a data processing method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an apparatus provided in an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Method embodiment one
Referring to fig. 1, the figure is a flowchart of a data processing method according to an embodiment of the present application.
The data processing method provided by the embodiment of the application comprises the following steps of S101-S104:
s101: and acquiring data to be processed.
The data to be processed may comprise data provided by at least one data source; moreover, the data source is not limited by the embodiments of the present application, and for example, the data source may include a vehicle factory that provides vehicle data.
According to the embodiment of the application, after the application scene of the data processing method is determined, the data to be processed can be obtained according to the application scene. For example, when the application scenario of the data processing method is related to a vehicle (e.g., abnormality detection, battery pack parameter evaluation correction, remaining duration prediction, remaining driving range prediction, battery life prediction, vehicle life prediction, driving behavior analysis, or authority authentication), then S101 may specifically be: and acquiring vehicle data provided by a plurality of vehicle factories as data to be processed.
S102: and performing field name mapping on the data to be processed to obtain first data.
The field names are the collective names of the variables. For example, the field names may be the names of voltage, current, temperature, power, motor speed, and the like. It should be noted that the field name may be not only a chinese name, but also an english abbreviation, and may also be an identifying symbol.
The field name mapping is used to normalize field names in the data.
Based on the above content, in the embodiment of the application, after the to-be-processed data is obtained, the to-be-processed data may be subjected to field name mapping to obtain the first data, so that the field names in the to-be-processed data are normalized, and especially, the field names in the data from different data sources are normalized, so that the field names are unified.
For example, when the to-be-processed data includes voltage data provided by a first vehicle factory to a third vehicle factory, a field name of the voltage data provided by the first vehicle factory is volt1, a field name of the voltage data provided by the second vehicle factory is volt2, a field name of the voltage data provided by the third vehicle factory is volt3, and a field name of the normalized voltage data is volt, S102 may specifically be: mapping a field name volt1 of the voltage data provided by the first factory to a volt to obtain the mapped voltage data provided by the first factory, wherein the field name of the mapped voltage data provided by the first factory is the volt; mapping a field name volt2 of the voltage data provided by the second factory to a volt to obtain the mapped voltage data provided by the second factory, wherein the field name of the mapped voltage data provided by the second factory is volt; and mapping the field name volt3 of the voltage data provided by the third vehicle manufacturer to volt to obtain the mapped voltage data provided by the third vehicle manufacturer, wherein the field name of the mapped voltage data provided by the third vehicle manufacturer is volt. At this time, the field names of the voltage data in the first data are all unified into the field name volt of the normalized voltage data.
In addition, an embodiment of S102 is further provided in this application, and in this embodiment, S102 may specifically be: and performing field name mapping on the data to be processed by utilizing a pre-constructed target field name mapping relation to obtain first data.
The target field name mapping relation is used for recording the mapping relation between each non-normalized field name and each normalized field name.
In addition, the target field name mapping relationship may be constructed in advance, and particularly may be constructed according to an application scenario. For ease of understanding and explanation, three embodiments of the construction process of the target field name mapping relationship will be described below as examples.
As a first implementation manner, the process of constructing the target field name mapping relationship may specifically be: when the data to be processed is provided by the target data source, a target field name mapping relation is generated according to the field name used by the target data source and the corresponding normalized field name.
In this embodiment, to construct the target field name mapping relationship, a field name (e.g., volt1) used by the target data source (e.g., a car manufacturer) and a normalized field name (e.g., volt) corresponding to the field name may be obtained, and then the target field name mapping relationship (e.g., volt1 mapping to volt) may be constructed according to the obtained field name used by the target data source and the normalized field name corresponding to the field name.
The field names used by the target data source may be actively provided by the target data source (e.g., a list of field names used by the vehicle manufacturer). In addition, the normalized field name may be set in advance, and particularly may be set according to an application scenario.
The above is the first embodiment of the construction process of the target field name mapping relationship.
As a second implementation manner, the process of constructing the target field name mapping relationship may specifically be: and performing field name mapping analysis on historical data provided by the target data source to obtain a target field name mapping relation.
In this embodiment, in order to construct a target field name mapping relationship, field name mapping analysis may be performed on historical data provided by a target data source to obtain the target field name mapping relationship, which may specifically include the following three steps:
in the first step, historical data provided by a target data source is obtained.
And secondly, performing field name analysis on historical data provided by the target data source, and determining the use habit of the target data source on the field names.
The usage habit of the target data source for the field names is used for representing the usage habit of the target data source for the field names, for example, the usage habit of the first car factory for the field names of the voltage data is as follows: volt1 is used as the field name for the voltage data.
It should be noted that the embodiments of the present application do not limit the specific implementation of the second step, and the second step may be implemented by using a method such as machine learning, deep learning, or a neural network model.
And thirdly, generating a target field name mapping relation according to the use habit of the target data source on the field names and the normalized field names.
The above is a second embodiment of the construction process of the target field name mapping relationship.
As a third implementation manner, the process of constructing the target field name mapping relationship may specifically be: generating a first field name mapping relation according to the field name used by the target data source and the corresponding normalized field name; performing field name mapping analysis on the historical data provided by the target data source to obtain a second field name mapping relation; and obtaining a target field name mapping relation according to the first field name mapping relation and the second field name mapping relation.
It should be noted that, in the embodiment of the present application, the first data is used to indicate data in which field name normalization is implemented.
S103: and when the abnormal data exist in the first data, performing abnormal data cleaning on the first data to obtain second data.
Abnormal data refers to abnormal data caused by various reasons (e.g., data acquisition equipment failure or information transmission error); also, the abnormal data may include invalid data and/or data exceeding a normal data range.
Here, the invalid data refers to data having no meaning of actual data, and for example, the invalid data may be "FF". In addition, the embodiment of the present application does not limit the type of the invalid data, for example, the invalid data may include a default value. It should be noted that: the default value refers to a default value uploaded when data acquisition fails due to sensor failure or other reasons. For example, if the probe temperature data is not collected at some point, it may be uploaded to-40 to represent the current temperature value for the probe. At this point we consider-40 to be anomalous data.
The data beyond the normal data range refers to data located outside the normal data range. For example, assuming that the normal data range of the battery temperature on the vehicle is [ -40 ℃, 40 ℃ ], if the first battery temperature is-50 ℃ in the first data, it may be determined that the first battery temperature is out of the normal data range of the battery temperature, and thus it may be determined that the first battery temperature is abnormal data.
The abnormal data cleaning is used for processing abnormal data in the data, so that the processed data does not have the abnormal data.
Based on the above, in the embodiment of the application, after the first data for realizing field name normalization is acquired, it may be determined whether abnormal data exists in the first data, so that when it is determined that abnormal data exists in the first data, the first data is subjected to abnormal data cleaning to obtain the second data, so that abnormal data does not exist in the obtained second data.
In addition, the present application provides a specific implementation of the abnormal data cleaning, and the following description is made with reference to three implementations.
As a first implementation, the abnormal data may be directly deleted, and S103 may include: and when determining that the first data has abnormal data, deleting the abnormal data in the first data to obtain second data.
In this embodiment, after the first data realizing field name normalization is acquired, when it is determined that abnormal data exists in the first data, the abnormal data in the first data may be directly deleted, and the first data from which the abnormal data is deleted may be used as the second data, so that the abnormal data does not exist in the second data. Due to the fact that the second data do not have abnormal data, adverse effects caused by the abnormal data can be effectively avoided when the second data are used for subsequent data processing, and therefore accuracy of data processing is improved.
The above is the first embodiment of the abnormal data cleansing.
In addition, in some cases, in order to ensure the integrity of data, abnormal data cannot be directly deleted, but is replaced with normal data. Based on this, the present application example further provides a second implementation manner of S103, in this implementation manner, S103 may specifically be: and when the abnormal data exist in the first data, replacing the abnormal data in the first data with the filling data to obtain second data.
The filling data is used to replace abnormal data, and the filling data may be determined according to data related to the abnormal data (for example, data at a historical time, an average value of a plurality of adjacent times, or normal data closest to the abnormal data).
It should be noted that, in the embodiment of the present application, a manner of obtaining padding data is not limited, and any data padding method may be adopted to determine padding data of abnormal data. The padding data may be set in advance, and may be set according to an application scenario.
As can be seen from the above, in the embodiment of the present application, after the first data for realizing field name normalization is acquired, when it is determined that abnormal data exists in the first data, filling data corresponding to the abnormal data may be determined according to the abnormal data and related data thereof (for example, data at historical time, data at multiple adjacent times, and the like), and then the filling data is used to replace the abnormal data in the first data to obtain the second data, so that the abnormal data does not exist in the second data. Due to the fact that the second data do not have abnormal data, adverse effects caused by the abnormal data can be effectively avoided when the second data are used for subsequent data processing, and therefore accuracy of data processing is improved. In addition, the second data is obtained by replacing the abnormal data, so that adverse effects caused by deletion of the abnormal data are avoided, and the accuracy of data processing is improved.
The above is the second embodiment of the abnormal data cleansing.
As a third embodiment, S103 may specifically be: when the abnormal data are determined to be suitable for deletion, deleting the abnormal data in the first data to obtain second data; and when the abnormal data are determined to be suitable for replacement, replacing the abnormal data in the first data by using the filling data to obtain second data.
It should be noted that, in the embodiment of the present application, the second data is used to represent data in which field name normalization is performed and no abnormal data exists.
S104: and when determining that the second data has data expressing the abnormity, performing abnormity expression conversion on the second data to obtain third data.
An expression anomaly refers to an expression of data that is not normalized. For example, if the normalized expression is "year/month/day" for the date, the expression form of 2019.11.7 does not match the normalized expression form 2019/11/7, and 2019.11.7 is determined to be data representing an abnormality.
The exception expression conversion is used for converting data expressing exception in the data into data expressing normalization; moreover, the embodiments of the present application do not limit the specific implementation of the exception expression conversion, for example, the exception expression conversion includes: at least one of a numerical offset, a numerical scaling, or a numerical mapping. Wherein, the numerical value deviation refers to the deviation of the abnormal data by a preset unit; the numerical scaling refers to scaling the data expressing the abnormality by a preset multiple; numerical mapping refers to mapping data that represents an anomaly to data that represents a normalized representation.
For ease of understanding and explanation, the following description is made in conjunction with two examples.
As a first example, if the normalized temperature is expressed as celsius (e.g., 20 ℃), when the first temperature data expressed as fahrenheit (e.g., 68 ° F) is present in the second data, the first temperature data may be converted to celsius using the formula "celsius (fahrenheit-32) ÷ 1.8", specifically: first temperature data is shifted to 32 (namely, 32 is subtracted from the first temperature data) to obtain shifted first temperature data; the shifted first temperature data is then reduced by a factor of 1.8 to obtain a temperature expression of the first temperature data in degrees celsius (e.g., 20 ℃).
As a second example, if the normalized date expression is "year/month/day", when the first date data (e.g., 2019.11.7) whose expression is "year.month.day" exists in the second data, the expression of the first date data may be mapped to the normalized date expression to obtain the normalized expression (e.g., 2019/11/7) of the first date data.
As a third example, if the normalized state of charge expression is "charge or discharge", when there is first state of charge data (e.g., 1) in the second data in an expression of 1 or 2 (where 1 denotes charge and 2 denotes discharge), the expression of the first state of charge data may be mapped to the normalized state of charge expression to obtain the normalized expression (e.g., charge) of the first state of charge data.
Based on the above, in the embodiment of the application, after the second data which realizes field name normalization and has no abnormal data is obtained, whether the data which expresses the abnormality exists in the second data is determined, so that when the data which expresses the abnormality exists in the second data is determined, the second data is subjected to abnormality expression conversion to obtain the third data, and the data which expresses the abnormality does not exist in the third data.
In the embodiment of the present application, the third data is used to indicate that field name normalization has been performed, that exception data does not exist, and that data expressing an exception does not exist.
The above is the relevant content of S101-S104.
In addition, after the third data is acquired, data processing in the corresponding application scenario may be performed using the third data. Based on this, the present application provides another implementation manner of the data processing method, and in this implementation manner, in addition to S101 to S104, the method further includes S105:
s105: and performing data processing by using the third data to obtain a data processing result.
The data processing process performed on the third data may be specifically determined according to an application scenario. For example, the data processing in S105 may include at least one of abnormality detection, battery pack parameter evaluation correction, remaining duration prediction, remaining range prediction, battery life prediction, vehicle life prediction, driving behavior analysis, or authority authentication.
In the above specific implementation manner of the data processing method provided by the embodiment of the present application, after the data to be processed is obtained, the data to be processed is subjected to field name mapping to obtain first data, and when it is determined that abnormal data exists in the first data, the first data is subjected to abnormal data cleaning to obtain second data, so that when it is determined that data expressing an abnormality exists in the second data, the second data is subjected to abnormal expression conversion to obtain third data. The field name mapping is used for realizing the field name normalization, the abnormal data cleaning is used for realizing the abnormal data normalization, and the abnormal expression conversion is used for realizing the data expression normalization, so that after the field name mapping, the abnormal data cleaning and the abnormal expression conversion are sequentially carried out on the data to be processed, the normalization of the data to be processed can be realized, the difference among data provided by different data sources is eliminated, and the occurrence of non-ideal data processing results caused by the difference among the data provided by the different data sources can be avoided.
Based on the data processing method provided by the above method embodiment, the data processing method will be described in entirety with reference to the accompanying drawings.
Method embodiment two
For the sake of brevity, the contents of the second method embodiment are described in a whole, and the same parts as those of the first method embodiment are not described again here.
Referring to fig. 2, the overall schematic diagram of the data processing method provided in the embodiment of the present application is shown.
The data processing method provided by the embodiment of the application comprises the following steps of S201-S210:
s201: and acquiring data to be processed.
Please refer to S101 for details of S201.
S202: and performing field name mapping on the data to be processed to obtain first data.
Please refer to S102 for details of S202.
S203: and judging whether the first data has abnormal data or not, if so, executing S204, and if not, executing S205.
S204: and carrying out abnormal data cleaning on the first data to obtain second data.
S205: the first data is regarded as the second data.
S206: judging whether the second data has data expressing abnormity, if so, executing S207; if not, go to step S208.
S207: and performing exception expression conversion on the second data to obtain third data.
S208: and taking the second data as third data.
S209: performing data processing by using the third data to obtain a data processing result; wherein the data processing includes at least one of anomaly detection, battery pack parameter evaluation correction, remaining duration prediction, remaining range prediction, battery life prediction, vehicle life prediction, driving behavior analysis, or authority authentication.
In this embodiment, after the data to be processed is obtained, the field name mapping may be performed on the data to be processed to obtain first data, and then the first data is preprocessed (where the preprocessing includes abnormal data cleaning and/or abnormal expression conversion) to obtain third data, so that the third data is used to perform data processing to obtain a data processing result. The third data is normalized in field name, abnormal data does not exist and data expression is normalized, so that the data provided by different data sources are normalized, the corresponding data processing result can be more accurately obtained through data processing based on the third data, and the situation that the data processing result is not ideal due to difference between data provided by different data sources can be effectively avoided.
Based on the data processing method provided by the above method embodiment, the embodiment of the present application further provides a data processing determining device, which is explained and explained below with reference to the accompanying drawings.
Device embodiment
Please refer to the above method embodiments for technical details of the data processing apparatus provided in the apparatus embodiments.
Referring to fig. 3, the figure is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
The data processing apparatus 300 provided in the embodiment of the present application includes:
a data acquisition unit 301 configured to acquire data to be processed;
a field mapping unit 302, configured to perform field name mapping on the to-be-processed data to obtain first data;
an abnormal cleaning unit 303, configured to perform abnormal data cleaning on the first data to obtain second data when it is determined that abnormal data exists in the first data;
and an expression conversion unit 304, configured to, when it is determined that there is data expressing an abnormality in the second data, perform abnormality expression conversion on the second data to obtain third data.
As an embodiment, in order to improve the normalization of data, the field mapping unit 302 includes:
and performing field name mapping on the data to be processed by utilizing a pre-constructed target field name mapping relation to obtain first data.
As an embodiment, in order to improve normalization of data, when the data to be processed is provided by a target data source, a construction process of the target field name mapping relationship includes:
generating a target field name mapping relation according to the field name used by the target data source and the corresponding normalized field name;
and/or the presence of a gas in the gas,
and performing field name mapping analysis on the historical data provided by the target data source to obtain a target field name mapping relation.
As an embodiment, in order to improve the normalization of the data, the exception cleaning unit 303 is specifically configured to:
deleting abnormal data in the first data to obtain second data;
and/or the presence of a gas in the gas,
and replacing abnormal data in the first data by using the filling data to obtain second data.
As an embodiment, in order to improve normalization of data, the abnormal data includes invalid data and/or data exceeding a normal data range.
As an embodiment, in order to improve normalization of data, the anomaly representation conversion includes: at least one of a numerical offset, a numerical scaling, or a numerical mapping.
As an embodiment, in order to improve the normalization of the data, the apparatus 300 further includes:
the data using unit is used for processing data by using the third data to obtain a data processing result; wherein the data processing includes at least one of anomaly detection, battery pack parameter evaluation correction, remaining duration prediction, remaining range prediction, battery life prediction, vehicle life prediction, driving behavior analysis, or authority authentication.
In the above specific implementation manner of the data processing apparatus provided in the embodiment of the present application, after the data to be processed is obtained, the data to be processed is subjected to field name mapping to obtain first data, and when it is determined that abnormal data exists in the first data, the first data is subjected to abnormal data cleaning to obtain second data, so that when it is determined that data expressing an abnormality exists in the second data, the second data is subjected to abnormal expression conversion to obtain third data. The field name mapping is used for realizing the field name normalization, the abnormal data cleaning is used for realizing the abnormal data normalization, and the abnormal expression conversion is used for realizing the data expression normalization, so that after the field name mapping, the abnormal data cleaning and the abnormal expression conversion are sequentially carried out on the data to be processed, the normalization of the data to be processed can be realized, the difference among data provided by different data sources is eliminated, and the occurrence of non-ideal data processing results caused by the difference among the data provided by the different data sources can be avoided.
Based on the data processing method provided by the above method embodiment, the embodiment of the present application further provides a device, which is explained and explained below with reference to the accompanying drawings.
Apparatus embodiment
Please refer to the above method embodiment for the device technical details provided by the device embodiment.
Referring to fig. 4, the figure is a schematic structural diagram of an apparatus provided in the embodiment of the present application.
The apparatus 400 provided in the embodiment of the present application includes: a processor 401 and a memory 402;
the memory 402 is used for storing computer programs;
the processor 401 is configured to execute any implementation of the data processing method provided by the above method embodiments according to the computer program. That is, the processor 401 is configured to perform the following steps:
acquiring data to be processed, and performing field name mapping on the data to be processed to obtain first data;
when abnormal data exist in the first data, performing abnormal data cleaning on the first data to obtain second data;
and when determining that the second data has data expressing abnormity, performing abnormity expression conversion on the second data to obtain third data.
Optionally, the performing field name mapping on the data to be processed to obtain first data includes:
and performing field name mapping on the data to be processed by utilizing a pre-constructed target field name mapping relation to obtain first data.
Optionally, when the data to be processed is provided by a target data source, the construction process of the target field name mapping relationship includes:
generating a target field name mapping relation according to the field name used by the target data source and the corresponding normalized field name;
and/or the presence of a gas in the gas,
and performing field name mapping analysis on the historical data provided by the target data source to obtain a target field name mapping relation.
Optionally, the performing abnormal data cleaning on the first data to obtain second data includes:
deleting abnormal data in the first data to obtain second data;
and/or the presence of a gas in the gas,
and replacing abnormal data in the first data by using the filling data to obtain second data.
Optionally, the abnormal data includes invalid data and/or data exceeding a normal data range.
Optionally, the exception expression conversion includes: at least one of a numerical offset, a numerical scaling, or a numerical mapping.
Optionally, the method further includes:
performing data processing by using the third data to obtain a data processing result; wherein the data processing includes at least one of anomaly detection, battery pack parameter evaluation correction, remaining duration prediction, remaining range prediction, battery life prediction, vehicle life prediction, driving behavior analysis, or authority authentication.
The above is related to the apparatus 400 provided in the embodiment of the present application.
Based on the data processing method provided by the above method embodiment, the embodiment of the present application further provides a computer-readable storage medium.
Media embodiments
Media embodiments provide technical details of computer-readable storage media, please refer to method embodiments.
Embodiments of the present application provide a computer-readable storage medium, which is used to store a computer program, where the computer program is used to execute any implementation manner of the data processing method provided in the foregoing method embodiments. That is, the computer program is for performing the steps of:
acquiring data to be processed, and performing field name mapping on the data to be processed to obtain first data;
when abnormal data exist in the first data, performing abnormal data cleaning on the first data to obtain second data;
and when determining that the second data has data expressing abnormity, performing abnormity expression conversion on the second data to obtain third data.
Optionally, the performing field name mapping on the data to be processed to obtain first data includes:
and performing field name mapping on the data to be processed by utilizing a pre-constructed target field name mapping relation to obtain first data.
Optionally, when the data to be processed is provided by a target data source, the construction process of the target field name mapping relationship includes:
generating a target field name mapping relation according to the field name used by the target data source and the corresponding normalized field name;
and/or the presence of a gas in the gas,
and performing field name mapping analysis on the historical data provided by the target data source to obtain a target field name mapping relation.
Optionally, the performing abnormal data cleaning on the first data to obtain second data includes:
deleting abnormal data in the first data to obtain second data;
and/or the presence of a gas in the gas,
and replacing abnormal data in the first data by using the filling data to obtain second data.
Optionally, the abnormal data includes invalid data and/or data exceeding a normal data range.
Optionally, the exception expression conversion includes: at least one of a numerical offset, a numerical scaling, or a numerical mapping.
Optionally, the method further includes:
performing data processing by using the third data to obtain a data processing result; wherein the data processing includes at least one of anomaly detection, battery pack parameter evaluation correction, remaining duration prediction, remaining range prediction, battery life prediction, vehicle life prediction, driving behavior analysis, or authority authentication.
The above is related to the computer-readable storage medium provided in the embodiments of the present application.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
The foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention in any manner. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims (10)

1. A data processing method, comprising:
acquiring data to be processed, and performing field name mapping on the data to be processed to obtain first data;
when abnormal data exist in the first data, performing abnormal data cleaning on the first data to obtain second data;
and when determining that the second data has data expressing abnormity, performing abnormity expression conversion on the second data to obtain third data.
2. The method according to claim 1, wherein said performing field name mapping on the data to be processed to obtain first data comprises:
and performing field name mapping on the data to be processed by utilizing a pre-constructed target field name mapping relation to obtain first data.
3. The method according to claim 2, wherein when the data to be processed is provided by a target data source, the construction process of the target field name mapping relationship comprises:
generating a target field name mapping relation according to the field name used by the target data source and the corresponding normalized field name;
and/or the presence of a gas in the gas,
and performing field name mapping analysis on the historical data provided by the target data source to obtain a target field name mapping relation.
4. The method of claim 1, wherein performing an exception data flush on the first data to obtain second data comprises:
deleting abnormal data in the first data to obtain second data;
and/or the presence of a gas in the gas,
and replacing abnormal data in the first data by using the filling data to obtain second data.
5. The method of claim 4, wherein the abnormal data comprises invalid data and/or data that is out of range of normal data.
6. The method of claim 1, wherein the anomaly representation transformation comprises: at least one of a numerical offset, a numerical scaling, or a numerical mapping.
7. The method according to any one of claims 1 to 6, further comprising:
performing data processing by using the third data to obtain a data processing result; wherein the data processing includes at least one of anomaly detection, battery pack parameter evaluation correction, remaining duration prediction, remaining range prediction, battery life prediction, vehicle life prediction, driving behavior analysis, or authority authentication.
8. A data processing apparatus, comprising:
the data acquisition unit is used for acquiring data to be processed;
the field mapping unit is used for carrying out field name mapping on the data to be processed to obtain first data;
the abnormal cleaning unit is used for cleaning the first data to obtain second data when the abnormal data exists in the first data;
and the expression conversion unit is used for performing abnormal expression conversion on the second data to obtain third data when the second data is determined to have abnormal expression data.
9. An apparatus, comprising a processor and a memory:
the memory is used for storing a computer program;
the processor is configured to perform the method of any one of claims 1-7 in accordance with the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program for performing the method of any of claims 1-7.
CN201911204530.7A 2019-11-29 2019-11-29 Data processing method and device Pending CN110990388A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911204530.7A CN110990388A (en) 2019-11-29 2019-11-29 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911204530.7A CN110990388A (en) 2019-11-29 2019-11-29 Data processing method and device

Publications (1)

Publication Number Publication Date
CN110990388A true CN110990388A (en) 2020-04-10

Family

ID=70088664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911204530.7A Pending CN110990388A (en) 2019-11-29 2019-11-29 Data processing method and device

Country Status (1)

Country Link
CN (1) CN110990388A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898546A (en) * 2020-07-31 2020-11-06 深圳市商汤科技有限公司 Data processing method and device, electronic equipment and storage medium
CN113254434A (en) * 2021-06-18 2021-08-13 智己汽车科技有限公司 Method, device, equipment and storage medium for cleaning vehicle driving data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838837A (en) * 2014-02-25 2014-06-04 浙江大学 Remote-sensing metadata integration method based on lexeme templates
CN106649599A (en) * 2016-11-25 2017-05-10 湖南纬度信息科技有限公司 Knowledge service oriented scientific research data processing and predictive analysis platform
CN107239581A (en) * 2017-07-07 2017-10-10 小草数语(北京)科技有限公司 Data cleaning method and device
CN108052665A (en) * 2017-12-29 2018-05-18 深圳市中易科技有限责任公司 A kind of data cleaning method and device based on distributed platform
CN108509485A (en) * 2018-02-07 2018-09-07 深圳壹账通智能科技有限公司 Preprocess method, device, computer equipment and the storage medium of data
CN109766331A (en) * 2018-12-06 2019-05-17 中科恒运股份有限公司 Method for processing abnormal data and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838837A (en) * 2014-02-25 2014-06-04 浙江大学 Remote-sensing metadata integration method based on lexeme templates
CN106649599A (en) * 2016-11-25 2017-05-10 湖南纬度信息科技有限公司 Knowledge service oriented scientific research data processing and predictive analysis platform
CN107239581A (en) * 2017-07-07 2017-10-10 小草数语(北京)科技有限公司 Data cleaning method and device
CN108052665A (en) * 2017-12-29 2018-05-18 深圳市中易科技有限责任公司 A kind of data cleaning method and device based on distributed platform
CN108509485A (en) * 2018-02-07 2018-09-07 深圳壹账通智能科技有限公司 Preprocess method, device, computer equipment and the storage medium of data
CN109766331A (en) * 2018-12-06 2019-05-17 中科恒运股份有限公司 Method for processing abnormal data and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898546A (en) * 2020-07-31 2020-11-06 深圳市商汤科技有限公司 Data processing method and device, electronic equipment and storage medium
CN113254434A (en) * 2021-06-18 2021-08-13 智己汽车科技有限公司 Method, device, equipment and storage medium for cleaning vehicle driving data
US11505199B1 (en) 2021-06-18 2022-11-22 Zhiji Automotive Technology Co., Ltd. Method, apparatus and device for cleaning up vehicle driving data and storage medium thereof

Similar Documents

Publication Publication Date Title
Li et al. One-shot battery degradation trajectory prediction with deep learning
Turetskyy et al. Toward data‐driven applications in lithium‐ion battery cell manufacturing
US11921566B2 (en) Abnormality detection system, abnormality detection method, abnormality detection program, and method for generating learned model
Strange et al. Prediction of future capacity and internal resistance of Li-ion cells from one cycle of input data
US20220187377A1 (en) Battery life learning device, battery life prediction device, method and non-transitory computer readable medium
CN110990388A (en) Data processing method and device
EP2560135A1 (en) Systems and methods for data anomaly detection
CN109687447B (en) Electric power energy consumption prediction method and device
US11108094B2 (en) Method and device for using an electrochemical energy store so as to optimize the service life
CN112765149B (en) Energy storage system capacity calculation system and method
CN112068003A (en) Method and device for predicting service life of cadmium-nickel storage battery based on linear wiener process
Shi et al. Cloud-based artificial intelligence framework for battery management system
Kharlamova et al. A digital twin of battery energy storage systems providing frequency regulation
JP7422272B2 (en) Method and apparatus for facilitating storage of data from industrial automation control systems or power systems
CN115238828A (en) Chromatograph fault monitoring method and device
CN115587673A (en) Voltage transformer error state prediction method and system
Ang et al. Efficient linear predictive model with short term features for lithium-ion batteries state of health estimation
Kunz et al. Early battery performance prediction for mixed use charging profiles using hierarchal machine learning
CN110728395A (en) Main transformer short-term power load calculation method and device, computer and storage medium
Wang et al. Large-scale field data-based battery aging prediction driven by statistical features and machine learning
Conradt et al. Methodology for determining time-dependent lead battery failure rates from field data
CN113393034A (en) Electric quantity prediction method of online self-adaptive OSELM-GARCH model
CN111178556A (en) Electric quantity abnormality detection method and device, computer equipment and readable storage medium
Chang-chang et al. Residual life prediction of aeroengine based on multi-scale permutation entropy and LSTM neural network
CN113343479A (en) Method and device for calculating service life of equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination