WO2024078339A1 - 基于车辆历史数据的故障预测方法、系统和存储介质 - Google Patents

基于车辆历史数据的故障预测方法、系统和存储介质 Download PDF

Info

Publication number
WO2024078339A1
WO2024078339A1 PCT/CN2023/122028 CN2023122028W WO2024078339A1 WO 2024078339 A1 WO2024078339 A1 WO 2024078339A1 CN 2023122028 W CN2023122028 W CN 2023122028W WO 2024078339 A1 WO2024078339 A1 WO 2024078339A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample data
vehicle
feature
model
fault prediction
Prior art date
Application number
PCT/CN2023/122028
Other languages
English (en)
French (fr)
Inventor
巩鑫
魏浩
Original Assignee
蔚来动力科技(合肥)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 蔚来动力科技(合肥)有限公司 filed Critical 蔚来动力科技(合肥)有限公司
Publication of WO2024078339A1 publication Critical patent/WO2024078339A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance

Definitions

  • the present invention relates to vehicle fault prediction, and in particular to a fault prediction method based on vehicle historical data, a computer system and a computer storage medium for vehicle fault prediction.
  • the present invention proposes a fault prediction method based on vehicle historical data, a computer system for vehicle fault prediction, and a computer storage medium.
  • the fault prediction scheme proposed by the present invention adopts a combination of combined perspectives and combined models to improve the accuracy and comprehensiveness of data mining while improving the accuracy of fault prediction.
  • a fault prediction method based on vehicle historical data comprising: A, extracting multiple sample data sets from the vehicle historical data based on different sample selection strategies; B, performing invalid feature elimination and availability screening for each of the multiple sample data sets respectively; and C, training a combined model using each of the multiple sample data sets, and obtaining a fault prediction result based on the trained combined model.
  • step A includes: A1, receiving vehicle history data of a faulty vehicle cluster and a non-faulty vehicle cluster, wherein the vehicle history data includes historical data of at least one source within the vehicle during a first time period ending at the time point when the fault occurs; A2, extracting positive sample data from the vehicle history data based on a positive sample selection strategy; A3, extracting multiple groups of negative sample data from the vehicle history data based on multiple negative sample selection strategies; and A4, combining the positive sample data with each of the multiple groups of negative sample data respectively to generate multiple sample data sets for fault prediction.
  • the positive sample selection strategy includes: extracting historical data during a second time period with the time point when the fault occurs as the end point from the vehicle historical data of the faulty vehicle cluster, wherein the second time period is smaller than the first time period.
  • the The negative sample selection strategy includes at least two of the following items: randomly selecting a subset of non-faulty vehicles from the non-faulty vehicle cluster, and randomly extracting a first group of negative sample data from the vehicle history data of the non-faulty vehicle subset; extracting historical data during a third time period with the starting point of the second time period as the end point from the vehicle history data of the faulty vehicle cluster as a second group of negative sample data, wherein the third time period is shorter than the first time period; and selecting a subset of faulty vehicles whose cumulative operating time is shorter than the minimum fault time from the faulty vehicle cluster, and extracting a third group of negative sample data from the vehicle history data of the faulty vehicle subset.
  • step B includes one of the following: using variance filtering method to eliminate non-divergent features in each sample data set; inputting the sample data sets into the algorithm model respectively to calculate the feature importance of each feature, and performing feature elimination based on the size of the feature importance.
  • step B includes: B1. For each sample data set, calculating the variance of each feature in the sample data set and eliminating features with zero variance from the sample data set.
  • step B includes performing the following operations for each sample data set: B2, inputting the original sample data set into the first algorithm model to obtain the actual feature importance of each feature; B3, randomly shuffling the labels in the original sample data set, and inputting the shuffled sample data set into the first algorithm model to obtain the feature importance of each feature under the random labels; B4, repeating step B3 N times to obtain N feature importances of each feature under the random labels; and B5, comparing the actual feature importance with the N feature importances under the random labels, and performing feature elimination based on the comparison results.
  • step B5 includes: for each feature, calculating the statistical characteristic value of the N feature importances, the statistical characteristic value including the 75% quantile of the N feature importances; calculating the difference between the actual feature importance and the statistical characteristic value; and if the difference is less than or equal to a first threshold, eliminating the feature.
  • step B further includes: using a first algorithm model to perform usability evaluation on each of the multiple sample data sets, and screening out sample data sets whose AUC values are less than or equal to a second threshold.
  • step C includes: C1, using each of the M sample data sets to train the first algorithm model respectively to obtain M trained first algorithm models; C2, using each of the M sample data sets to train the second algorithm model respectively to obtain M trained second algorithm models; C3, using one of the M sample data sets to test the M trained first algorithm models and the M trained second algorithm models to obtain the accuracy of each model; and C4, Taking the accuracy of each model as a weight, weighted averaging is performed on the prediction results of the M trained first algorithm models and the M trained second algorithm models to obtain a fault prediction result under the combined model.
  • a computer system for vehicle fault prediction comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, any one of the methods described in the first aspect of the present invention is implemented.
  • a computer storage medium comprising instructions, the instructions executing any one of the methods according to the first aspect of the present invention when run.
  • the fault prediction scheme adopts a combined perspective approach (for example, extracting multiple sample data sets based on multiple sample selection strategies) to avoid the bias caused by defining positive and negative samples from a single perspective, thereby being able to more comprehensively and accurately mine fault symptom information in the vehicle's historical operation data.
  • the fault prediction scheme eliminates invalid features in the sample data set based on, for example, variance filtering or feature importance, and further adopts a combined model to improve the performance of the model, thereby achieving a balance between the overall computational complexity and the fault prediction accuracy of the model.
  • FIG. 1 is a flow chart of a fault prediction method 10 based on vehicle historical data according to one or more embodiments of the present invention.
  • FIG. 2 is a block diagram of a computer system 20 for vehicle fault prediction according to one or more embodiments of the present invention.
  • FIG. 1 is a flow chart of a fault prediction method 10 based on vehicle historical data according to one or more embodiments of the present invention.
  • step S110 multiple sample data sets are extracted from the vehicle historical data based on different sample selection strategies.
  • sample selection strategies for example, multiple negative sample selection strategies
  • sample data sets are defined from multiple perspectives, thereby avoiding the bias caused by defining sample data sets from a single perspective, and mining fault symptom information from the vehicle historical operation data more comprehensively and accurately.
  • step S110 firstly, vehicle history data of a faulty vehicle cluster (e.g., a set of faulty vehicles) and a non-faulty vehicle cluster (e.g., a set of non-faulty vehicles) are received. Since the actual fault of the vehicle is associated with various physical quantities under a specific environment, there is an objective correlation between the vehicle history data and the vehicle fault, and therefore, the vehicle history data can be used for vehicle fault prediction.
  • the vehicle history data may include the deflection angle of the steering machine, and this data can be used for analyzing steering faults.
  • the vehicle historical data includes historical data from at least one source within the vehicle (e.g., an on-board sensor, an electronic control unit) during a first period ending at the time point when the fault occurs.
  • the vehicle historical data is generated based on the sensor data of the vehicle.
  • the deflection angle of the steering gear can be collected by, for example, an angular deflection sensor.
  • the vehicle historical data can also be collected by, for example, a position sensor, an acceleration sensor, a temperature sensor, etc.
  • the vehicle historical data can also be obtained from other sources.
  • the motor torque can be generated based on the torque command generated by the electronic control unit, so the vehicle historical data can also be collected from a vehicle controller such as an electronic control unit.
  • the sample data extraction strategy can be divided into a positive sample data extraction strategy and a negative sample data extraction strategy.
  • positive sample data can be extracted from vehicle historical data based on the same positive sample selection strategy. For example, historical data during a second time period (the second time period is smaller than the first time period) ending at the time point when the fault occurred can be extracted from the vehicle historical data of the faulty vehicle cluster. Since the probability of data anomalies existing in the period before the fault occurs is the highest, this period of time (i.e., the second time period) can be regarded as a data degradation period, and the data in the data degradation period can be used as a positive sample.
  • the specific duration of the second time period can be determined based on the type of fault combined with business experience, and the present invention does not specifically limit this.
  • multiple groups of negative sample data may be extracted from the vehicle history data based on multiple different negative sample selection strategies.
  • the negative sample selection strategy includes at least two of strategies 1 to 3 described in detail below.
  • Strategy 1 randomly select a subset of non-faulty vehicles from the non-faulty vehicle cluster, and randomly extract the first set of negative sample data from the vehicle history data of the subset of non-faulty vehicles.
  • Strategy 3 Select a subset of faulty vehicles whose cumulative running time is less than the minimum fault time from the faulty vehicle cluster, and extract the third set of negative sample data from the vehicle history data of the faulty vehicle subset.
  • the above cumulative running time refers to the cumulative running time of the vehicle since it was produced
  • the minimum fault time refers to the minimum time interval from the production of the vehicle to the occurrence of the fault in the faulty vehicle cluster.
  • the above-mentioned positive sample data is respectively combined with each group of multiple groups of negative sample data (for example, the first group of negative sample data, the second group of negative sample data, and the third group of negative sample data) to generate multiple sample data sets (for example, the first sample data set, the second sample data set, and the third sample data set) for fault prediction.
  • each group of multiple groups of negative sample data for example, the first group of negative sample data, the second group of negative sample data, and the third group of negative sample data
  • multiple sample data sets for example, the first sample data set, the second sample data set, and the third sample data set
  • step S120 invalid features are eliminated and availability screening is performed for each of the multiple sample data sets. It is understandable that the invalid features eliminated and availability screening for the sample data sets can at least bring the following benefits: reducing the size of training data, reducing the overall amount of calculation, and speeding up model training; reducing model complexity and avoiding overfitting; reducing feature input, which is conducive to explaining the model; and improving model accuracy.
  • the variance filtering method can be used to eliminate non-divergent features in each sample data set (for example, the first sample data set, the second sample data set, and the third sample data set).
  • a non-divergent feature refers to a feature in which the samples have basically no difference, that is, the feature does not play a role in distinguishing the samples.
  • the divergence of a feature can be judged based on the variance, for example, for each sample data set, the variance of each feature in the sample data set is calculated and features with a variance of zero are eliminated from the sample data set.
  • invalid features can be removed based on the algorithm model.
  • each sample data set can be input into the algorithm model (e.g., LightGBM model, random forest model, XGBoost model) to calculate the feature importance of each feature, and remove features based on the size of the feature importance.
  • the algorithm model e.g., LightGBM model, random forest model, XGBoost model
  • the following two invalid feature removal strategies based on the algorithm model are provided.
  • each sample data set is input into the first algorithm model such as the LightGBM model, and then The feature importance of each feature (for example, including information gain and number of splits) is calculated, and features with zero information gain or zero feature importance are removed.
  • the first algorithm model such as the LightGBM model
  • each sample data set is input into the first algorithm model respectively to obtain the actual feature importance of each feature; the labels in the original sample data set are randomly shuffled, and the shuffled sample data set is input into the first algorithm model again to obtain the feature importance of each feature under the random label; the above shuffled input operation is repeated N times (N is a positive integer) to obtain N feature importances of each feature under the random label; and the actual feature importance is compared with the N feature importances under the random label, and features are eliminated based on the comparison results.
  • the feature is a low-quality feature and needs to be eliminated.
  • invalid features can be eliminated based on the statistical characteristic value of feature importance. For example, for each feature, the statistical characteristic value of N feature importances is calculated, and the statistical characteristic value can be the 75% quantile, average, or other quantile of the N feature importances; and if the difference between the actual feature importance and the statistical characteristic value is less than or equal to the first threshold (for example, 0), the feature is eliminated, otherwise the feature is retained.
  • the first threshold for example, 0
  • the first algorithm model e.g., LightGBM model, random forest model, XGBoost model
  • the first algorithm model can be used to evaluate the availability of each of the multiple sample data sets, and determine whether the sample data set can be used for subsequent modeling operations based on the evaluation results.
  • the training data of each sample data set can be input into the first algorithm model for model training, and the test effect can be obtained based on the trained model, wherein the test effect can be judged based on the AUC (Area Under Curve) indicator. For example, if the AUC value is less than or equal to the second threshold (e.g., 0.5), the sample data set can be used for subsequent modeling operations; otherwise, the sample data set is deleted.
  • the second threshold e.g., 0.5
  • Time series feature extraction should be performed on each sample data set.
  • Time series feature extraction can be based on time series feature extraction methods known in the art, including but not limited to time series data integrity judgment, time series information feature construction, irrelevant feature and redundant feature elimination, etc., which are not specifically limited in the present invention.
  • step S130 the combined model is trained using each of the multiple sample data sets, and a fault prediction result is obtained based on the trained combined model.
  • the sample data set as described in step S130 should include time series feature data that has been extracted from time series features.
  • the above-mentioned combined model includes at least two heterogeneous classification models, for example, a random forest model, a LightGBM model, a neural network (NN) model, a K nearest neighbor (KNN) model, etc.
  • each of a plurality of (M, where M is a positive integer greater than 1) sample data sets is used to train the first algorithm model (e.g., LightGBM model) to obtain M trained models.
  • the first algorithm model trained by the M sample data sets is used; and the second algorithm model (for example, the random forest model) is trained using each of the M sample data sets to obtain M trained second algorithm models.
  • the M trained first algorithm models and the M trained second algorithm models are tested using one of the M sample data sets (for example, the first sample data set, or the second sample data set, or the third sample data set) to obtain the accuracy of each model.
  • the prediction results of the M trained first algorithm models and the M trained second algorithm models are weighted averaged with the accuracy of each model as the weight to obtain the fault prediction result under the combined model.
  • a combined perspective is adopted (for example, multiple sample data sets are extracted based on multiple sample selection strategies) to avoid the bias caused by defining positive and negative samples from a single perspective, so that the fault symptom information in the historical operation data of the vehicle can be mined more comprehensively and accurately.
  • invalid features in the sample data set are eliminated based on, for example, variance filtering or feature importance, and the performance of the model is further improved by adopting a combined model, so that a balance between the overall computational amount and the fault prediction accuracy of the model can be achieved.
  • FIG2 is a block diagram of a computer system 20 for vehicle fault prediction according to an embodiment of the present invention.
  • the computer system 20 includes a memory 210, a processor 220, and a computer program 230 stored in the memory 210 and executable on the processor 220.
  • the processor 220 executes the computer program 230, the method 10 shown in FIG1 can be implemented.
  • the present invention can also be implemented as a computer storage medium in which a program for causing a computer to execute the method 10 shown in FIG1 is stored.
  • various computer storage media such as disks (e.g., magnetic disks, optical disks, etc.), cards (e.g., memory cards, optical cards, etc.), semiconductor memories (e.g., ROMs, nonvolatile memories, etc.), and tapes (e.g., magnetic tapes, cassette tapes, etc.) can be used.
  • Software according to the present invention can be stored on one or more computer storage media. It is also contemplated that the software identified herein can be implemented using one or more general or special computers and/or computer systems networked and/or otherwise. Where applicable, the order of the various steps described herein can be changed, combined into composite steps and/or divided into sub-steps to provide the features described herein.

Abstract

本申请涉及车辆故障预测,具体而言,涉及一种基于车辆历史数据的故障预测方法、用于车辆故障预测的计算机系统和计算机存储介质。该方法包括:A、基于不同的样本选择策略从车辆历史数据中提取多个样本数据集;B、分别针对多个样本数据集中的每个进行无效特征剔除以及可用性筛选;以及C、利用多个样本数据集中的每个对组合模型进行训练,并基于经训练的组合模型获取故障预测结果。本申请所提出的故障预测方案采用组合视角与组合模型相结合的方式,在提升数据挖掘准确度和全面性的同时,提高了故障预测的准确度。

Description

基于车辆历史数据的故障预测方法、系统和存储介质 技术领域
本发明涉及车辆故障预测,具体而言,涉及一种基于车辆历史数据的故障预测方法、用于车辆故障预测的计算机系统和计算机存储介质。
背景技术
目前,针对车辆故障的处理方式大多停留在客户抱怨后进行的被动维修操作阶段。随着信息技术和机器学习技术的发展,部分现有技术尝试利用大数据分析和预见性模型来对车辆故障进行提前预警。然而,由于部分车辆故障的发生存在突然性,故障发生前缺少相应的故障征兆,因此使得采集的车辆运行数据中缺少表征故障的征兆信息,从而导致故障查全率低、故障预测准确度低等问题。
发明内容
为了解决或至少缓解以上问题中的一个或多个,本发明提出了一种基于车辆历史数据的故障预测方法、用于车辆故障预测的计算机系统和计算机存储介质。本发明所提出的故障预测方案采用组合视角与组合模型相结合的方式,在提升数据挖掘准确度和全面性的同时,提高了故障预测的准确度。
按照本发明的第一方面,提供一种基于车辆历史数据的故障预测方法,该方法包括:A、基于不同的样本选择策略从所述车辆历史数据中提取多个样本数据集;B、分别针对所述多个样本数据集中的每个进行无效特征剔除以及可用性筛选;以及C、利用所述多个样本数据集中的每个对组合模型进行训练,并基于经训练的组合模型获取故障预测结果。
作为以上方案的替代或补充,在根据本发明一实施例的方法中,步骤A包括:A1、接收故障车辆簇和非故障车辆簇的车辆历史数据,其中所述车辆历史数据包括车辆内的至少一个来源在以出现故障的时间点为终点的第一时段期间的历史数据;A2、基于正样本选取策略从所述车辆历史数据中提取正样本数据;A3、基于多种负样本选取策略从所述车辆历史数据中提取多组负样本数据;以及A4、将所述正样本数据分别与所述多组负样本数据中的每一组相组合,以生成用于故障预测的多个样本数据集。
作为以上方案的替代或补充,在根据本发明一实施例的方法中,在步骤A2中,所述正样本选取策略包括:从所述故障车辆簇的车辆历史数据中提取以出现故障的时间点为终点的第二时段期间的历史数据,其中所述第二时段小于所述第一时段。
作为以上方案的替代或补充,在根据本发明一实施例的方法中,在步骤A3中,所述 负样本选取策略包括以下各项中的至少两项:从所述非故障车辆簇中随机选取非故障车辆子集,并从所述非故障车辆子集的车辆历史数据中随机提取第一组负样本数据;从所述故障车辆簇的车辆历史数据中提取以所述第二时段的起点为终点的第三时段期间的历史数据以作为第二组负样本数据,其中所述第三时段小于所述第一时段;以及从所述故障车辆簇中选取累计运行时长小于最小故障时长的故障车辆子集,并从所述故障车辆子集的车辆历史数据中提取第三组负样本数据。
作为以上方案的替代或补充,在根据本发明一实施例的方法中,步骤B包括以下之一:利用方差过滤法,剔除每个样本数据集中的非发散特征;将所述样本数据集分别输入至算法模型以计算每个特征的特征重要度,并基于所述特征重要度的大小进行特征剔除。
作为以上方案的替代或补充,在根据本发明一实施例的方法中,步骤B包括:B1、针对每个样本数据集,计算样本数据集中每个特征的方差并从所述样本数据集中剔除方差为零的特征。
作为以上方案的替代或补充,在根据本发明一实施例的方法中,步骤B包括针对每个样本数据集进行下列操作:B2、将原始样本数据集输入至第一算法模型,以获取每个特征的实际特征重要度;B3、将原始样本数据集中的标签随机打乱,并将经打乱的样本数据集输入至所述第一算法模型,以获取每个特征在随机标签下的特征重要度;B4、将步骤B3重复N次,以获取每个特征在随机标签下的N个特征重要度;以及B5、将所述实际特征重要度与随机标签下的所述N个特征重要度进行比较,并基于比较结果进行特征剔除。
作为以上方案的替代或补充,在根据本发明一实施例的方法中,步骤B5包括:针对每个特征,计算所述N个特征重要度的统计特征值,所述统计特征值包括所述N个特征重要度的75%分位数;计算所述实际特征重要度与所述统计特征值的差值;以及若所述差值小于或等于第一阈值,则剔除该特征。
作为以上方案的替代或补充,在根据本发明一实施例的方法中,步骤B进一步包括:利用第一算法模型对所述多个样本数据集中的每个分别进行可用性评估,并且筛除AUC值小于或等于第二阈值的样本数据集。
作为以上方案的替代或补充,在根据本发明一实施例的方法中,步骤C包括:C1、分别利用M个样本数据集中的每个对第一算法模型进行训练,以获得M个经训练的第一算法模型;C2、分别利用M个样本数据集中的每个对第二算法模型进行训练,以获得M个经训练的第二算法模型;C3、利用所述M个样本数据集中的一个对所述M个经训练的第一算法模型和所述M个经训练的第二算法模型进行测试,以获取每个模型的准确率;以及C4、 以所述每个模型的准确率为权重,对所述M个经训练的第一算法模型和所述M个经训练的第二算法模型的预测结果进行加权平均,以获取组合模型下的故障预测结果。
根据本发明的第二方面,提供一种用于车辆故障预测的计算机系统,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现根据本发明第一方面所述的方法中的任意一项。
根据本发明的第三方面,提供一种计算机存储介质,所述计算机存储介质包括指令,所述指令在运行时执行根据本发明第一方面所述的方法中的任意一项。
一方面,根据本发明的一个或多个实施例的故障预测方案采用组合视角的方式(例如,基于多种样本选择策略提取多个样本数据集)来避免从单个角度定义正负样本带来的偏颇,从而能够更全面、准确地挖掘车辆历史运行数据中的故障征兆信息。
另一方面,根据本发明的一个或多个实施例的故障预测方案基于例如方差过滤法或特征重要度对样本数据集中的无效特征进行剔除,并进一步采用组合模型的方式提升模型的性能,从而能够实现整体计算量以及模型的故障预测准确度之间的平衡。
附图说明
本发明的上述和/或其它方面和优点将通过以下结合附图的各个方面的描述变得更加清晰和更容易理解,附图中相同或相似的单元采用相同的标号表示。在所述附图中:
图1为按照本发明的一个或多个实施例的基于车辆历史数据的故障预测方法10的流程图;以及
图2为按照本发明的一个或多个实施例的用于车辆故障预测的计算机系统20的框图。
具体实施方式
以下具体实施方式的描述本质上仅仅是示例性的,并且不旨在限制所公开的技术或所公开的技术的应用和用途。此外,不意图受在前述技术领域、背景技术或以下具体实施方式中呈现的任何明示或暗示的理论的约束。
在实施例的以下详细描述中,阐述了许多具体细节以便提供对所公开技术的更透彻理解。然而,对于本领域普通技术人员显而易见的是,可以在没有这些具体细节的情况下实践所公开的技术。在其他实例中,没有详细描述公知的特征,以避免不必要地使描述复杂化。
诸如“包含”和“包括”之类的用语表示除了具有在说明书中有直接和明确表述的单元和步骤以外,本发明的技术方案也不排除具有未被直接或明确表述的其它单元和步骤的情形。诸如“第一”和“第二”之类的用语并不表示单元在时间、空间、大小等方面的顺序 而仅仅是作区分各单元之用。本文中的术语“车辆”或者其它类似的术语包括一般的机动车辆以及混合动力汽车、电动车、插电式混动电动车等。
在下文中,将参考附图详细地描述根据本发明的各示例性实施例。
图1为按照本发明的一个或多个实施例的基于车辆历史数据的故障预测方法10的流程图。
如图1所示,在步骤S110中,基于不同的样本选择策略从车辆历史数据中提取多个样本数据集。如背景技术部分所述,由于部分车辆故障的发生存在突然性,故障发生前缺少相应的故障征兆,因此仅凭借专家经验分析难以从车辆历史运行数据中发现明显的故障征兆信息,从而导致故障查全率低、故障预测准确度低。本发明采用组合视角的方式,也即,基于不同的样本选择策略(例如,多种负样本选择策略)、从多个角度定义样本数据集,从而能够避免从单个角度定义样本数据集带来的偏颇,更全面、准确地从车辆历史运行数据中挖掘故障征兆信息。
可选地,在步骤S110中,首先接收故障车辆簇(例如,故障车辆集合)和非故障车辆簇(例如,非故障车辆集合)的车辆历史数据。由于车辆的实际故障与特定环境下的各种物理量是关联的,因而车辆历史数据与车辆故障之间存在客观的关联,也因此可以用于车辆故障预测。例如,车辆历史数据可以包括转向机的偏转角度,这一数据可以用于分析转向故障。
示例性地,车辆历史数据包括车辆内的至少一个来源(例如,车载传感器、电子控制单元)在以出现故障的时间点为终点的第一时段期间的历史数据。在本申请的一些实施例中,车辆历史数据是基于车辆的传感器数据生成的。继续上面的示例,转向机的偏转角度可以通过例如角偏转传感器采集。在其他一些示例中还可以通过例如位置传感器、加速度传感器、温度传感器等来收集车辆历史数据。当然,车辆历史数据还可以从其他来源获得。例如,电机扭矩可以根据由电子控制单元生成的扭矩指令产生,因而还可以从诸如电子控制单元之类的车辆控制器采集车辆历史数据。
样本数据的提取策略可以分为正样本数据提取策略和负样本数据提取策略。可选地,可以基于相同的正样本选取策略从车辆历史数据中提取正样本数据,例如,可以从故障车辆簇的车辆历史数据中提取以出现故障的时间点为终点的第二时段(第二时段小于第一时段)期间的历史数据。由于故障发生前一段时间内数据存在异常的概率最大,因此可以将该段时间(即,第二时段)视为数据劣化期,并将数据劣化期内的数据作为正样本。具体的第二时段的时长可以根据故障类型结合业务经验确定,本发明对此不作具体限定。
可选地,可以基于多种不同的负样本选取策略从车辆历史数据中提取多组负样本数据。示例性地,负样本选取策略包括以下详述的策略1-策略3中的至少两种。
策略1,从非故障车辆簇中随机选取非故障车辆子集,并从非故障车辆子集的车辆历史数据中随机提取第一组负样本数据。
策略2,从故障车辆簇的车辆历史数据中提取以第二时段的起点为终点的第三时段期间的历史数据以作为第二组负样本数据,其中第三时段小于第一时段。可以理解的是,通过同一车辆的数据劣化期(例如,第二时段)与非数据劣化期(例如,第三时段)的数据对比,更容易找出真正的故障信息,因此,故障车辆簇的车辆历史数据亦可作为负样本数据的来源。
策略3,从故障车辆簇中选取累计运行时长小于最小故障时长的故障车辆子集,并从故障车辆子集的车辆历史数据中提取第三组负样本数据。上述累计运行时长指代车辆从生产下线开始的累计运行时间,最小故障时长指代故障车辆簇中车辆从生产下线到发生故障的最小时间间隔。
可选地,将上述正样本数据分别与多组负样本数据中的每一组(例如,第一组负样本数据、第二组负样本数据以及第三组负样本数据)相组合,以生成用于故障预测的多个样本数据集(例如,第一样本数据集、第二样本数据集以及第三样本数据集)。
在步骤S120中,分别针对多个样本数据集中的每个进行无效特征剔除以及可用性筛选。可以理解的是,针对样本数据集的无效特征剔除以及可用性筛选至少能够带来以下好处:减少训练数据大小,降低整体计算量,加快模型训练速度;减少模型复杂度,避免过拟合;减少特征输入,有利于解释模型;以及提升模型准确率。
可选地,在无效特征剔除阶段,可以利用方差过滤法,分别剔除每个样本数据集(例如,第一样本数据集、第二样本数据集以及第三样本数据集)中的非发散特征。非发散特征指代样本在该特征上基本没有差异,也就是说,该特征对于样本的区分没有起到作用。示例性地,特征的发散性可以基于方差进行判断,例如,针对每个样本数据集,计算样本数据集中每个特征的方差并从样本数据集中剔除方差为零的特征。
替代地,还可以基于算法模型进行无效特征剔除。示例性地,可以将每个样本数据集分别输入至算法模型(例如,LightGBM模型、随机森林模型、XGBoost模型),以计算每个特征的特征重要度,并基于特征重要度的大小进行特征剔除。具体而言,提供以下两种基于算法模型的无效特征剔除策略。
第一,将每个样本数据集分别输入至诸如LightGBM模型之类的第一算法模型,并输 出各特征的特征重要度(例如,包括信息增益和分裂次数),剔除信息增益为零或特征重要度为零的特征。
第二,将每个样本数据集分别输入至第一算法模型,以获取每个特征的实际特征重要度;将原始样本数据集中的标签随机打乱,并将经打乱的样本数据集再次输入至上述第一算法模型,以获取每个特征在随机标签下的特征重要度;将上述打乱后的输入操作重复N(N为正整数)次,以获取每个特征在随机标签下的N个特征重要度;以及将实际特征重要度与随机标签下的N个特征重要度进行比较,并基于比较结果进行特征剔除。
可以理解的是,真正稳定且重要的优质特征在随机标签下的重要性会变差。相反地,若某个特征的实际特征重要度较低,但在随机标签下的重要性反而升高,则该特征为劣质特征,需要进行剔除。在一个示例中,可以基于特征重要度的统计特征值进行无效特征剔除。例如,针对每个特征,计算N个特征重要度的统计特征值,该统计特征值可以是N个特征重要度的75%分位数、平均值、或其他分位数;并且若实际特征重要度与该统计特征值的差值小于或等于第一阈值(例如,0),则剔除该特征,反之则保留该特征。
进一步地,在可用性筛选阶段,可以利用第一算法模型(例如,LightGBM模型、随机森林模型、XGBoost模型)分别对多个样本数据集中的每个进行可用性评估,并基于评估结果确定样本数据集是否可以用于后续建模操作。示例性地,可以将每个样本数据集的训练数据分别输入至第一算法模型以进行模型训练,并基于经训练的模型获取测试效果,其中可以基于AUC(Area Under Curve,面积下曲线)指标对测试效果进行判断,例如,如果AUC值小于或等于第二阈值(例如,0.5),则该样本数据集可以用于后续建模操作;反之,则删除该样本数据集。
在无效特征剔除以及可用性筛选之后,为了更有效地获取车辆历史数据中的时序信息,应对各样本数据集进行时序特征提取。时序特征提取可以是基于本领域已知的时序特征提取方法,包括但不限于时序数据完整性判断、时序信息特征构造、无关特征和冗余特征剔除等步骤,本发明对此不做具体限定。
接下来,在步骤S130中,利用多个样本数据集中的每个对组合模型进行训练,并基于经训练的组合模型获取故障预测结果。应理解的是,如步骤S130所述的样本数据集应包括已经过时序特征提取的时序特征数据。上述组合模型包括至少两个异质分类模型,例如,随机森林模型、LightGBM模型、神经网络(NN)模型、K最邻近(KNN)模型等。
在步骤S130的模型训练阶段,分别利用多个(M个,其中M为大于1的正整数)样本数据集中的每个对第一算法模型(例如,LightGBM模型)进行训练,以获得M个经训 练的第一算法模型;并且分别利用M个样本数据集中的每个对第二算法模型(例如,随机森林模型)进行训练,以获得M个经训练的第二算法模型。接着,利用M个样本数据集中的一个(例如,第一样本数据集、或第二样本数据集、或第三样本数据集)对M个经训练的第一算法模型和M个经训练的第二算法模型进行测试,以获取每个模型的准确率。在模型的集成阶段,以每个模型的准确率为权重,对M个经训练的第一算法模型和M个经训练的第二算法模型的预测结果进行加权平均,以获取组合模型下的故障预测结果。
按照本发明的方法10采用组合视角的方式(例如,基于多种样本选择策略提取多个样本数据集)来避免从单个角度定义正负样本带来的偏颇,从而能够更全面、准确地挖掘车辆历史运行数据中的故障征兆信息。此外,按照本发明的方法10基于例如方差过滤法或特征重要度对样本数据集中的无效特征进行剔除,并进一步采用组合模型的方式提升模型的性能,从而能够实现整体计算量以及模型的故障预测准确度之间的平衡。
图2为按照本发明的一个实施例的用于车辆故障预测的计算机系统20的框图。如图2中所示,计算机系统20包括存储器210、处理器220和存储在存储器210上并可在处理器220上运行的计算机程序230。处理器220执行计算机程序230时能够实现如图1所示的方法10。
另外,如上所述,本发明也可以被实施为一种计算机存储介质,在其中存储有用于使计算机执行如图1所示的方法10的程序。在此,作为计算机存储介质,可以采用盘类(例如,磁盘、光盘等)、卡类(例如,存储卡、光卡等)、半导体存储器类(例如,ROM、非易失性存储器等)、带类(例如,磁带、盒式磁带等)等各种方式的计算机存储介质。
在可适用的情况下,可以使用硬件、软件或硬件和软件的组合来实现由本发明提供的各种实施例。而且,在可适用的情况下,在不脱离本发明的范围的情况下,本文中阐述的各种硬件部件和/或软件部件可以被组合成包括软件、硬件和/或两者的复合部件。在可适用的情况下,在不脱离本发明的范围的情况下,本文中阐述的各种硬件部件和/或软件部件可以被分成包括软件、硬件或两者的子部件。另外,在可适用的情况下,预期的是,软件部件可以被实现为硬件部件,以及反之亦然。
根据本发明的软件(诸如程序代码和/或数据)可以被存储在一个或多个计算机存储介质上。还预期的是,可以使用联网的和/或以其他方式的一个或多个通用或专用计算机和/或计算机系统来实现本文中标识的软件。在可适用的情况下,本文中描述的各个步骤的顺序可以被改变、被组合成复合步骤和/或被分成子步骤以提供本文中描述的特征。
提供本文中提出的实施例和示例,以便最好地说明按照本发明及其特定应用的实施例,并且由此使本领域的技术人员能够实施和使用本发明。但是,本领域的技术人员将会知道,仅为了便于说明和举例而提供以上描述和示例。所提出的描述不是意在涵盖本发明的各个方面或者将本发明局限于所公开的精确形式。

Claims (12)

  1. 一种基于车辆历史数据的故障预测方法,其特征在于,包括:
    A、基于不同的样本选择策略从所述车辆历史数据中提取多个样本数据集;
    B、分别针对所述多个样本数据集中的每个进行无效特征剔除以及可用性筛选;以及
    C、利用所述多个样本数据集中的每个对组合模型进行训练,并基于经训练的组合模型获取故障预测结果。
  2. 根据权利要求1所述的方法,其中,步骤A包括:
    A1、接收故障车辆簇和非故障车辆簇的车辆历史数据,其中所述车辆历史数据包括车辆内的至少一个来源在以出现故障的时间点为终点的第一时段期间的历史数据;
    A2、基于正样本选取策略从所述车辆历史数据中提取正样本数据;
    A3、基于多种负样本选取策略从所述车辆历史数据中提取多组负样本数据;以及
    A4、将所述正样本数据分别与所述多组负样本数据中的每一组相组合,以生成用于故障预测的多个样本数据集。
  3. 根据权利要求2所述的方法,在步骤A2中,所述正样本选取策略包括:从所述故障车辆簇的车辆历史数据中提取以出现故障的时间点为终点的第二时段期间的历史数据,其中所述第二时段小于所述第一时段。
  4. 根据权利要求3所述的方法,在步骤A3中,所述负样本选取策略包括以下各项中的至少两项:
    从所述非故障车辆簇中随机选取非故障车辆子集,并从所述非故障车辆子集的车辆历史数据中随机提取第一组负样本数据;
    从所述故障车辆簇的车辆历史数据中提取以所述第二时段的起点为终点的第三时段期间的历史数据以作为第二组负样本数据,其中所述第三时段小于所述第一时段;以及
    从所述故障车辆簇中选取累计运行时长小于最小故障时长的故障车辆子集,并从所述故障车辆子集的车辆历史数据中提取第三组负样本数据。
  5. 根据权利要求1所述的方法,其中,步骤B包括以下之一:
    利用方差过滤法,剔除每个样本数据集中的非发散特征;
    将所述样本数据集分别输入至算法模型以计算每个特征的特征重要度,并基于所述特征重要度的大小进行特征剔除。
  6. 根据权利要求1所述的方法,其中,步骤B包括:
    B1、针对每个样本数据集,计算样本数据集中每个特征的方差并从所述样本数据集中剔除方差为零的特征。
  7. 根据权利要1所述的方法,其中,步骤B包括针对每个样本数据集进行下列操作:
    B2、将原始样本数据集输入至第一算法模型,以获取每个特征的实际特征重要度;
    B3、将原始样本数据集中的标签随机打乱,并将经打乱的样本数据集输入至所述第一算法模型,以获取每个特征在随机标签下的特征重要度;
    B4、将步骤B3重复N次,以获取每个特征在随机标签下的N个特征重要度;以及
    B5、将所述实际特征重要度与随机标签下的所述N个特征重要度进行比较,并基于比较结果进行特征剔除。
  8. 根据权利要求7所述的方法,其中,步骤B5包括:
    针对每个特征,计算所述N个特征重要度的统计特征值,所述统计特征值包括所述N个特征重要度的75%分位数;
    计算所述实际特征重要度与所述统计特征值的差值;以及
    若所述差值小于或等于第一阈值,则剔除该特征。
  9. 根据权利要求1所述的方法,其中,步骤B进一步包括:
    利用第一算法模型对所述多个样本数据集中的每个分别进行可用性评估,并且筛除AUC值小于或等于第二阈值的样本数据集。
  10. 根据权利要求1所述的方法,其中,步骤C包括:
    C1、分别利用M个样本数据集中的每个对第一算法模型进行训练,以获得M个经训练的第一算法模型;
    C2、分别利用M个样本数据集中的每个对第二算法模型进行训练,以获得M个经训练的第二算法模型;
    C3、利用所述M个样本数据集中的一个对所述M个经训练的第一算法模型和所述M个经训练的第二算法模型进行测试,以获取每个模型的准确率;以及
    C4、以所述每个模型的准确率为权重,对所述M个经训练的第一算法模型和所述M个经训练的第二算法模型的预测结果进行加权平均,以获取组合模型下的故障预测结果。
  11. 一种用于车辆故障预测的计算机系统,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至10中任一项所述的方法。
  12. 一种计算机存储介质,其特征在于,所述计算机存储介质包括指令,所述指令在运行时执行根据权利要求1至10中任一项所述的方法。
PCT/CN2023/122028 2022-10-09 2023-09-27 基于车辆历史数据的故障预测方法、系统和存储介质 WO2024078339A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211225309.1 2022-10-09
CN202211225309.1A CN115563503A (zh) 2022-10-09 2022-10-09 基于车辆历史数据的故障预测方法、系统和存储介质

Publications (1)

Publication Number Publication Date
WO2024078339A1 true WO2024078339A1 (zh) 2024-04-18

Family

ID=84744535

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/122028 WO2024078339A1 (zh) 2022-10-09 2023-09-27 基于车辆历史数据的故障预测方法、系统和存储介质

Country Status (2)

Country Link
CN (1) CN115563503A (zh)
WO (1) WO2024078339A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115563503A (zh) * 2022-10-09 2023-01-03 蔚来动力科技(合肥)有限公司 基于车辆历史数据的故障预测方法、系统和存储介质
CN116644351B (zh) * 2023-06-13 2024-04-02 石家庄学院 一种基于人工智能的数据处理方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096405A (zh) * 2021-06-10 2021-07-09 天津所托瑞安汽车科技有限公司 预测模型的构建方法、车辆事故预测方法及装置
US20220084335A1 (en) * 2020-09-11 2022-03-17 Nec Laboratories America, Inc. Vehicle intelligence tool for early warning with fault signature
CN114742316A (zh) * 2022-05-05 2022-07-12 中国第一汽车股份有限公司 超速预测方法、装置、存储介质及电子装置
CN115563503A (zh) * 2022-10-09 2023-01-03 蔚来动力科技(合肥)有限公司 基于车辆历史数据的故障预测方法、系统和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220084335A1 (en) * 2020-09-11 2022-03-17 Nec Laboratories America, Inc. Vehicle intelligence tool for early warning with fault signature
CN113096405A (zh) * 2021-06-10 2021-07-09 天津所托瑞安汽车科技有限公司 预测模型的构建方法、车辆事故预测方法及装置
CN114742316A (zh) * 2022-05-05 2022-07-12 中国第一汽车股份有限公司 超速预测方法、装置、存储介质及电子装置
CN115563503A (zh) * 2022-10-09 2023-01-03 蔚来动力科技(合肥)有限公司 基于车辆历史数据的故障预测方法、系统和存储介质

Also Published As

Publication number Publication date
CN115563503A (zh) 2023-01-03

Similar Documents

Publication Publication Date Title
WO2024078339A1 (zh) 基于车辆历史数据的故障预测方法、系统和存储介质
CN108986869B (zh) 一种使用多模型预测的磁盘故障检测方法
CN107103171B (zh) 机器学习模型的建模方法及装置
CN111914873A (zh) 一种两阶段云服务器无监督异常预测方法
JP5299267B2 (ja) 診断装置
JP2015026252A (ja) 異常検知装置及びプログラム
CN111459700A (zh) 设备故障的诊断方法、诊断装置、诊断设备及存储介质
CN110609524B (zh) 一种工业设备剩余寿命预测模型及其构建方法和应用
CN109918313B (zh) 一种基于GBDT决策树的SaaS软件性能故障诊断方法
JP7268756B2 (ja) 劣化抑制プログラム、劣化抑制方法および情報処理装置
US10809695B2 (en) Information processing apparatus, machine learning device and system
CN113010389A (zh) 一种训练方法、故障预测方法、相关装置及设备
CN114297036A (zh) 数据处理方法、装置、电子设备及可读存储介质
CN113076239B (zh) 一种高性能计算机用混合神经网络故障预测方法及系统
EP4050527A1 (en) Estimation program, estimation method, information processing device, relearning program, and relearning method
CN108717496B (zh) 雷达天线阵面故障检测方法及系统
CN113806889A (zh) 一种tbm刀盘扭矩实时预测模型的处理方法、装置以及设备
KR20210108874A (ko) 기계 학습을 사용하여 스토리지 장치 장애를 예측하는 시스템 및 장치
CN114662386A (zh) 一种轴承故障诊断方法及系统
CN113822336A (zh) 一种云硬盘故障预测方法、装置、系统及可读存储介质
CN113609569A (zh) 一种判别式的广义零样本学习故障诊断方法
CN114756420A (zh) 故障预测方法及相关装置
CN116400168A (zh) 一种基于深度特征聚类的电网故障诊断方法及系统
Sun et al. Application of deep belief networks for precision mechanism quality inspection
CN112990329B (zh) 一种系统异常诊断方法和装置